November 19, 2025
2 min read

FriendliAI Partners with Nebius to Deliver High-Performance, Cost-Efficient AI Inference

We are thrilled to announce a strategic partnership with Nebius, a leading global GPU cloud Infrastructure provider. This collaboration is a significant step in our mission to make world-class, high-performance AI inference technology universally accessible and affordable for AI startups and Enterprises worldwide.

FriendliAI’s proprietary inference optimization technology has been integrated into Nebius' large-scale AI cloud infrastructure. This powerful combination is set to deliver an immediate upgrade to companies running essential generative AI services including customer support chatbots, coding copilots, and AI agents on the FriendliAI platform.

Why This Partnership Matters: The Ultimate AI Efficiency Stack

For AI startups and enterprises, the cost and operational complexity of running AI models at scale are major burdens. Our partnership directly tackles these challenges by integrating our unique AI inference platform with Nebius' robust GPU infrastructure, providing unmatched performance across three critical metrics:

Cost Efficiency: Customers can achieve over 50% reduction in GPU costs, a substantial saving for high-volume AI workloads.
Blazing Speed: Leveraging our purpose-built inference stack, we provide over 2x faster inference speed.
Unwavering Stability: Our combined solution is engineered for reliability, offering a 99.99% uptime guarantee (SLA) that is crucial for mission-critical enterprise applications.
Guaranteed GPUs for Instant Scaling: Expanding our multi-cloud strategy by adding Nebius' reliable infrastructure, FriendliAI automatically scales GPU resources to match fluctuating inference demand, optimizing performance and cost. We also guarantee GPU availability for sudden spikes or large workloads, preventing the throttling and service interruptions.

By utilizing FriendliAI’s API on the Nebius infrastructure, AI startups and enterprises can instantly experience improved speed, superior cost efficiency, and enhanced stability in their inference environments.

Meet Nebius: A Global Leader in GPU Infrastructure

Nebius is a neo-cloud company based in Amsterdam, providing infrastructure tailored for the AI workloads. As a publicly-traded company on NASDAQ, Nebius currently provides high-performance AI workload infrastructure across Europe, North America, and Israel.

Notably, Nebius recently solidified its position as a core supplier in the global AI infrastructure market by entering into a significant $19.4 billion AI computing partnership with Microsoft.

This partnership ensures that customers deploying AI models will benefit from an ecosystem built on global-scale reliability and top-tier performance.

Quote from FriendliAI CEO

"Our goal is to make our world-class AI inference technology easily accessible to all companies," said Byung-Gon Chun, CEO of FriendliAI. "The integration of FriendliAI's inference optimization technology with Nebius' robust GPU cloud means that every customer can now deploy AI models with a combination of top-level latency, stability, and cost-efficiency."

We are dedicated to delivering the best-in-class inference platform that powers real-world, production-ready AI applications. The partnership with Nebius accelerates this vision, making high-performance AI inference a global reality.

Written by

Jiwon Park

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an engineer — our engineers (not a bot) will reply within one business day.