Pricing build to scale with your growth

Fast, reliable, and affordable inference at any scale. Get started instantly with self-serve, or contact us for enterprise deployments.

Get started Contact us

Model APIs

Run the fastest frontier model inference with a simple API call.

See pricing

Dedicated Endpoints

Run dedicated inference with unmatched speed and reliability at scale.

See pricing

Container

Run inference with full control and performance in your environment.

Looking for Enterprise Capabilities?

The Enterprise plan is offered as a customizable framework, not a fixed bundle.
Features and capabilities are enabled based on your contract.
Let’s talk about an Enterprise plan designed for you.

Scale & Reliability

Custom Model APIs rate limits
Priority access to high-demand GPU types
Reserved GPU capacity

Control & Deployment

Custom region deployments
VPC deployments
On-prem deployment options

Enterprise Commitments

Dedicated support channels
Named Customer Success ownership
Custom commercial terms

Contact Sales

Model APIs Pricing

Get instant access to the fastest frontier model inference with a simple API call.

Text and vision

Pay per token

Model

$ / 1M tokens

GLM-5.1

$1.4 Input · $0.26 Cached Input · $4.4 Output

gemma-4-31B-it

$0.14 Input · $0.4 Output

GLM-5

$1 Input · $0.5 Cached Input · $3.2 Output

Llama-3.1-8B-Instruct

$0.1

Qwen3-235B-A22B-Instruct-2507

$0.2 Input · $0.8 Output

Llama-3.3-70B-Instruct

$0.6

DeepSeek-V3.2

$0.5 Input · $0.25 Cached Input · $1.5 Output

MiniMax-M2.5

$0.3 Input · $0.06 Cached Input · $1.2 Output

K-EXAONE-236B-A23B

$0.2 Input · $0.1 Cached Input · $0.8 Output

Kimi-K2.5

$0.6 Input · $3 Output

For models where cached input pricing is not listed, prompt caching discounts may be available for enterprise deployments.
Contact us to learn more.

Speech to text (STT)

Pay per second of audio input

Model

$ / audio minute

whisper-large-v3

$0.0015

Dedicated Endpoints Pricing

Get instant access to the fastest frontier model inference with a simple API call.

On-demand deployment

Only pay for the compute you use, down to the second, with no extra charges for start-up times

GPU Type

$ / hour (billed per second)

A100 80GB GPU

$2.9

H100 80GB GPU

$3.9

H200 141GB GPU

$4.5

B200 180GB GPU

$8.9

For estimates of per-token prices, see this page. Results vary by use case, but we often observe 2-3x higher throughput and faster speed on FriendliAI compared to open source inference engines.

Container Pricing

Run inference with full control and performance in your environment.

Pricing build to scale with your growth

Model APIs

Dedicated Endpoints

Container

Looking for Enterprise Capabilities?

Scale & Reliability

Control & Deployment

Enterprise Commitments

Model APIs Pricing

Text and vision

Speech to text (STT)

Dedicated Endpoints Pricing

On-demand deployment

Container Pricing

Explore FriendliAI today

Text and vision

Speech to text (STT)