Pricing

Find the best product for you

Friendli Serverless Endpoints

Fast and affordable API for open-source models

Compare plans and features

Inference

OpenAI compatible APIs

Trial
Basic
Enterprise

Optimized inference APIs

Trial
Basic
Enterprise

Long context (128K) handling

Trial
Basic
Enterprise

Function calling & JSON mode

Trial
Basic
Enterprise

Rate limit

Trial
10 requests/min 50K tokens/min
Basic
10K requests/min 100K tokens/min
(Contact sales to increase limits)
Enterprise
Unlimited

Tools

Document parsing

Trial
Basic
Enterprise

Web search

Trial
Basic
Enterprise

Code interpreter

Trial
Basic
Enterprise

Other built-in tools

Trial
Basic
Enterprise

Pricing details

Important Update

Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.

Show details

Token-based Billing

Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a "token" refers to an individual unit processed by the model.

Time-based Billing

You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.

Token-based billing

Model name

$ / 1M tokens

DeepSeek-R1

Input

$3

Output

$7

Llama-3.3-70B-Instruct

$0.6

Llama-3.1-8B-Instruct

$0.1

Time-based billing

Model name

$ / second

Llama-4-Maverick-17B-128E-Instruct

$0.004

Llama-4-Scout-17B-16E-Instruct

$0.002

Qwen3-235B-A22B

$0.004

Qwen3-30B-A3B

$0.002

Qwen3-32B

$0.002

gemma-3-27b-it

$0.002

Mistral-Small-3.1-24B-Instruct-2503

$0.002

Devstral-Small-2505

$0.002

Magistral-Small-2506

$0.002