Pricing

Find the best product for you

Friendli Serverless Endpoints

Fast and affordable API for open-source models

Compare plans and features

Inference

OpenAI compatible APIs

Starter
Enterprise

Optimized inference APIs

Starter
Enterprise

Long context (128K) handling

Starter
Enterprise

Function calling & JSON mode

Starter
Enterprise

Rate limit

Starter

Varies with tier

See tier details
Enterprise
Custom
Tools

Document parsing

Starter
Enterprise

Web search

Starter
Enterprise

Code interpreter

Starter
Enterprise

Other built-in tools

Starter
Enterprise

Minimum usage tier

Starter
Tier 1+
Enterprise
Custom

Pricing details

Important Update

Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.

Show details

Token-based Billing

Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a "token" refers to an individual unit processed by the model.

Time-based Billing

You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.

Token-based billing

Model name

$ / 1M tokens

DeepSeek-R1

Input

$3

Output

$7

Llama-3.3-70B-Instruct

$0.6

Llama-3.1-8B-Instruct

$0.1

Time-based billing

Model name

$ / second

Llama-4-Maverick-17B-128E-Instruct

$0.004

Llama-4-Scout-17B-16E-Instruct

$0.002

Qwen3-235B-A22B

$0.004

Qwen3-30B-A3B

$0.002

Qwen3-32B

$0.002

gemma-3-27b-it

$0.002

Mistral-Small-3.1-24B-Instruct-2503

$0.002

Devstral-Small-2505

$0.002

Magistral-Small-2506

$0.002