March 18, 2025
4 min read

Deploy Multimodal Models from Hugging Face to FriendliAI with Ease

We’re excited to announce an important expansion of our service! We've now broadened our support for Hugging Face models to include multimodal capabilities, enabling our users to leverage a wider spectrum of AI models.

Deploying and scaling these models has been a challenging task, particularly when it comes to handling high-performance inference demands. With FriendliAI’s cutting-edge technology, Hugging Face users can now deploy multimodal models to Friendli Endpoints directly from the Hugging Face Hub with just one click, ensuring high performance and low latency for even the most complex, resource-intensive tasks.

What Does This Mean to You and Why Multimodal Models Matter

Multimodal models represent a major leap forward in the AI space, transforming the way we interact with technology. Traditionally, AI models have been designed to focus on a single type of data—text, images, or audio. However, multimodal models integrate multiple types of data, such as combining images and text to provide more sophisticated, contextual insights. This expansion unlocks a world of possibilities, allowing users to create richer, more advanced applications that understand and generate content across various forms of media.

The future of AI lies in its ability to process and make sense of the complex, multi-sensory world we experience. In the real world, data doesn’t exist in isolation; we perceive everything through a combination of images, sounds, and text. For AI to evolve and match this complexity, it must integrate and understand multiple data types simultaneously. By supporting multimodal models, FriendliAI is empowering developers, businesses, and researchers to build AI systems that are not only more intuitive and versatile but also far more powerful.

Imagine a model that can not only read the text of a book but also generate a relevant image based on that text, or interpret the sentiment of a video by analyzing both the audio and visual context. The potential of multimodal AI is vast, and we’re excited to be part of this evolution.

Key Features of Multimodal Support

Diverse Multimodal Models from Hugging Face: We’ve integrated some of the most popular and powerful multimodal models available from Hugging Face. These models allow users to work with both textual and visual data to generate responses, process queries, and even create new media content.
Wide Range of Applications: Multimodal models can be used in a variety of use cases, from improving accessibility with vision-to-text and text-to-vision translation, to powering more intelligent search engines, or even generating captions for images or videos.
User-Friendly Interface: We’ve maintained our focus on making complex technologies accessible. Our intuitive UI ensures that integrating multimodal AI into your projects is quick and simple, with minimal configuration required.
Scalability and Flexibility: Whether you’re building small-scale applications or large enterprise systems, our platform supports a wide range of use cases and scales to meet your needs. You can quickly experiment with new models and fine-tune them for your specific tasks.

Accelerate Multimodal AI Inference with FriendliAI

With FriendliAI’s inference infrastructure, Hugging Face users can easily deploy these multimodal models, benefiting from top-tier performance and cost-efficiency. Whether deploying image-to-text models, building multimodal chatbots, or developing AI voice agents, FriendliAI’s scalable infrastructure provides best-in-class performance without the hassle or excessive cost of infrastructure management.

We have made deployment easier than ever. All you need to do is click the “Friendli Endpoints” button from the “Deploy” tab.

Figure 1: Deploying to Friendli Endpoints from Hugging Face.

Once deployed, you can easily interact with your multimodal models, and also seamlessly compare responses of various models. Monitor in real-time, test, and refine with ease. With Friendli Endpoints, we take care of all the infrastructure hassles, so you can solely focus on innovating with multimodal AI.

Figure 2: Side-by-Side Comparison in Friendli Suite Playground.

How to Get Started

To begin exploring multimodal models on FriendliAI, simply head to our platform and start experimenting by clicking here. With Hugging Face’s extensive model hub and our seamless integration, you’ll have access to the best and latest in AI innovation, whether you’re a developer, researcher, or enthusiast.

If you’re new to the Hugging Face ecosystem or multimodal AI, we also provide detailed documentation and tutorials to help you get up to speed quickly. Check out our documentation and latest blogs:

Looking Ahead

At FriendliAI, our commitment to providing the best AI tools and services is stronger than ever. We’re not just keeping up with the latest trends in AI – we’re driving them. We look forward to seeing how you leverage multimodal models to create the next generation of intelligent applications.

Stay tuned for more updates, and as always, we welcome your feedback and ideas!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.