- May 2, 2025
- 2 min read
How to Use Hugging Face Multi-LoRA Adapters

In the previous article, we explored the mechanics behind LoRA (Low-Rank Adaptation) and its growing importance in adapting large-scale models with minimal overhead. We also introduced Multi-LoRA: the ability to run and switch between multiple LoRA adapters in a single model.
In this follow-up, we’ll walk through how to use multi-LoRA adapters on Hugging Face, from loading and combining adapters, to deploying them on Friendli Dedicated Endpoints. You can even add or remove adapters on the fly!
What Is Multi-LoRA?
Multi-LoRA is an extension of the LoRA technique that facilitates the deployment of multiple specialized adapters on a single base model. This approach allows a single model to perform various tasks by dynamically loading different adapters, each fine-tuned for a specific task or domain. The key advantage is that it enables serving multiple specialized models without the overhead of maintaining separate full models for each task.
Why Use Multi-LoRA?
- Efficiency: By loading multiple small adapter modules instead of full models, you can achieve task specialization with minimal additional memory and computational overhead
- Scalability: Deploying multiple adapters on a single GPU allows for scalable solutions, especially beneficial when hardware resources are limited
- Flexibility: Easily switch between tasks by loading the appropriate adapter, enabling dynamic and versatile model behavior
How to Use Hugging Face Multi-LoRA Adapters
FriendliAI supports diverse models directly from Hugging Face, making it the best platform for deploying and scaling adapter-based AI workflows.
Whether you're fine-tuning for personalization, style transfer, or domain-specific tasks, FriendliAI makes it easy to run multiple adapters on a single model—with superior performance and minimal setup.
Here’s how to use Hugging Face LoRA adapters with FriendliAI’s Multi-LoRA support:
- Select “Friendli Endpoints” from the “Deploy” tab on Hugging Face model page
For example, here’s the link for black-forest-labs/FLUX.1-schnell.
- Click “Deploy now” to deploy models to Friendli Dedicated Endpoints
- Select “Configure myself”
- Add LoRA adapters
In this example, we add six different LoRA adapters. But you can add as few or as many as needed.
- Deploy
Once deployed, you can send inference requests and specify which LoRA adapter to use at runtime. This means you don’t need to redeploy or reload the model to switch tasks—just select the adapter dynamically in your request.
- Dynamically load adapter of choice and send requests
This dynamic adapter switching is a unique capability of FriendliAI, the only platform that allows real-time, per-request LoRA selection.
Conclusion
Deploying multiple Hugging Face LoRA adapters on FriendliAI provides a powerful, scalable way to serve specialized AI tasks—all from a single base model. By avoiding the need to duplicate full models, you reduce memory overhead and operational complexity.
What truly sets FriendliAI apart is its support for live adapter update—a capability no other provider offers. This makes it the ideal platform for building flexible, multi-task AI systems that are efficient, production-ready, and easy to manage at scale.
Written by
FriendliAI Tech & Research
Share