Dedicated Endpoints
Build and run LLMs in the cloud

Get Started Read the docs

Autopilot LLM endpoints for production

Easily create inference endpoints that are performant, scalable, and cost-effective

“Working with FriendliAI, we created a
convenient and dependable service
without the need for self-management”

FEATURES & BENEFITS

Superior cost-efficiency and performance with Friendli Engine

Train and serve custom models

Efficient and cost-effective serving with autoscaling

Dedicated GPU resource management

Superior cost-efficiency
and performance

Having a performant LLM serving solution is the first step to operate your AI application in the cloud.

10x+ faster

token generation

5x+ faster

initial response time

Run Friendli Engine on the cloud to reduce LLM serving cost up to 80%.

Our engine achieves 6 times higher throughput. Serve more traffic on less GPUs with Friendli Engine.

Our engine generates tokens 10 times faster guaranteeing unmatched efficiency and performance in your generative AI operations.

Custom model support

We offer comprehensive support for both open-source and custom LLMs, allowing organizations to deploy models tailored to their unique requirements and domain-specific challenges.With the flexibility to integrate proprietary datasets, businesses can unlock new opportunities for innovation and differentiation in their AI-driven applications.Create a new endpoint with your private Hugging Face Model Hub repository or upload your model directly to Dedicated Endpoints.

Dedicated GPU Resource Management

FriendliAI Dedicated Endpoints provides dedicated GPU instances ensuring consistent access to computing resources without contention or performance fluctuations.By eliminating resource sharing, organizations can rely on predictable performance levels for their LLM inference tasks, enhancing productivity and reliability.

Auto-Scale your resources on cloud

When deploying generative AI on cloud, it is important to scale as your business grows.Friendli Dedicated Endpoints employs intelligent auto-scaling mechanisms that dynamically adjust computing resources based on real-time demand and workload patterns.

PRICING

Basic

Featured highlights

Build and run LLMs on autopilot

Billed monthly

Pricing details

Friendli on A100 80GB

$3.8 per hour

Enterprise

Contact Sales

Featured highlights

Custom pricing

Dedicated support

EXPLORE FRIENDLI SUITE

Other ways to run generative AI models with Friendli

Friendli Container

Serve LLMs with Friendli Engine in your private environment

Learn more

Friendli Serverless Endpoints

Fast and affordable API for open-source gen AI models