May 2, 2025
2 min read

How to Use Hugging Face Multi-LoRA Adapters

In the previous article, we explored the mechanics behind LoRA (Low-Rank Adaptation) and its growing importance in adapting large-scale models with minimal overhead. We also introduced Multi-LoRA: the ability to run and switch between multiple LoRA adapters in a single model.

In this follow-up, we’ll walk through how to use multi-LoRA adapters on Hugging Face, from loading and combining adapters, to deploying them on Friendli Dedicated Endpoints. You can even add or remove adapters on the fly!

What Is Multi-LoRA?

Multi-LoRA is an extension of the LoRA technique that facilitates the deployment of multiple specialized adapters on a single base model. This approach allows a single model to perform various tasks by dynamically loading different adapters, each fine-tuned for a specific task or domain. The key advantage is that it enables serving multiple specialized models without the overhead of maintaining separate full models for each task.

Why Use Multi-LoRA?

Efficiency: By loading multiple small adapter modules instead of full models, you can achieve task specialization with minimal additional memory and computational overhead
Scalability: Deploying multiple adapters on a single GPU allows for scalable solutions, especially beneficial when hardware resources are limited
Flexibility: Easily switch between tasks by loading the appropriate adapter, enabling dynamic and versatile model behavior

How to Use Hugging Face Multi-LoRA Adapters?

FriendliAI supports diverse models directly from Hugging Face, making it the best platform for deploying and scaling adapter-based AI workflows.

Whether you're fine-tuning for personalization, style transfer, or domain-specific tasks, FriendliAI makes it easy to run multiple adapters on a single model—with superior performance and minimal setup.

Here’s how to use Hugging Face LoRA adapters with FriendliAI’s Multi-LoRA support:

1. Select “Friendli Endpoints” from the “Deploy” tab on Hugging Face model page

For example, here’s the link for black-forest-labs/FLUX.1-schnell. Hugging Face model page

Figure 1: Hugging Face model page.

2. Click “Deploy now” to deploy models to Friendli Dedicated Endpoints

Deploying FLUX.1-schnell on Friendli

Figure 2: Deploying FLUX.1-schnell on Friendli.

3. Select “Configure myself”

Configure myself

Figure 3: Configure myself.

4. Add LoRA adapters

In this example, we add six different LoRA adapters. But you can add as few or as many as needed. Adding 6 LoRA adapters

Figure 4: Adding 6 LoRA adapters.

5. Deploy

Deploy

Figure 5: Deploy.

Once deployed, you can send inference requests and specify which LoRA adapter to use at runtime. This means you don’t need to redeploy or reload the model to switch tasks—just select the adapter dynamically in your request.

6. Dynamically load adapter of choice and send requests

This dynamic adapter switching is a unique capability of FriendliAI, the only platform that allows real-time, per-request LoRA selection.

Send requests with dynamic adapter selection

Figure 6: Send requests with dynamic adapter selection.

Conclusion

Deploying multiple Hugging Face LoRA adapters on FriendliAI provides a powerful, scalable way to serve specialized AI tasks—all from a single base model. By avoiding the need to duplicate full models, you reduce memory overhead and operational complexity.

What truly sets FriendliAI apart is its support for live adapter update—a capability no other provider offers. This makes it the ideal platform for building flexible, multi-task AI systems that are efficient, production-ready, and easy to manage at scale.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.