January 22, 2025
3 min read

Deploy Models from Hugging Face to Friendli Endpoints

In this blog, we announce our new strategic partnership with Hugging Face, the leading platform to host and collaborate on AI models, datasets, and applications. FriendliAI’s cutting-edge inference infrastructure is now integrated into the Hugging Face Hub, simplifying and accelerating generative AI model inference serving.

The new integration introduces FriendliAI Endpoints as a deployment option within the Hugging Face Hub, offering developers direct access to high-performance, cost-effective inference infrastructure. By combining Hugging Face’s user-friendly platform with FriendliAI’s GPU-optimized inference technology, we’re empowering developers to unlock the full potential of generative AI while minimizing operational complexities and costs.

Simplifying Model Deployment on Hugging Face Hub

Last year, we introduced integration with Hugging Face, allowing users to seamlessly deploy Hugging Face models directly within the Friendli Suite platform. Through this integration, users have gained access to thousands of supported open-source models on Hugging Face, as well as the capability to deploy private models effortlessly. Building on this success, we are taking the integration further by enabling 1-click deployment directly within the Hugging Face Hub.

Selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Additionally, while your deployment is processing, you can chat with open-source models directly on the page, exploring and testing their capabilities before production. Click this link to head to the Hugging Face site and deploy the Llama 3.3 70B Instruct model directly.

Alternatively, you can deploy dedicated endpoints for private Hugging Face models by searching for the models on the FriendliAI platform.

Deploy models with NVIDIA H100 in Friendli Dedicated Endpoints

With our advanced GPU-optimized inference engine, Dedicated Endpoints delivers fast and cost-effective inference as a managed service. Developers can effortlessly deploy open-source or custom models on NVIDIA H100 GPUs using Friendli Dedicated Endpoints by clicking “Deploy now” on the model deployment page.

For developers deploying custom or private Hugging Face models, Friendli Dedicated Endpoints offers a managed service optimized for NVIDIA H100 GPUs–powerful but expensive to operate at scale. With just a click of “Deploy now” on the model deployment page, you can effortlessly deploy your models. Friendli Dedicated Endpoints deliver:

Fast and Cost-Effective Inference: FriendliAI’s optimization reduces the number of GPUs required while maintaining peak performance, significantly lowering costs.
Simplified Infrastructure Management: Focus on innovation while FriendliAI handles the complexities of scaling and managing infrastructure.

Inference Open-Source Models with Friendli Serverless Endpoints

For developers looking to efficiently inference open-source models, Friendli Serverless Endpoints is an ideal solution. They offer:

User-Friendly APIs: Simplify interactions with open-source models through APIs optimized by FriendliAI.
High Performance at Low Cost: Ensure efficient inference with minimal expense.
Interactive Experience: Chat directly with powerful open-source models on the model deployment page, making it easy to explore and test their capabilities.

You can chat with these powerful open-source models directly on the model deployment page.

Driving the Future of AI Together

Our partnership marks a significant milestone in our mission to make AI more accessible, efficient, and impactful. By integrating FriendliAI’s advanced inference technology into the Hugging Face Hub, we are simplifying the deployment process and enabling developers to focus on what matters most: innovation.

We are thrilled to deepen our collaboration with Hugging Face and look forward to empowering the global AI community with tools that drive groundbreaking advancements. Together, we are shaping the next era of AI development. Read more about our partnership on the Hugging Face blog and start deploying models today on the Hugging Face Hub.

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.