Pricing
Find the best product for you
Friendli Serverless Endpoints
Fast and affordable API for open-source models
Compare plans and features
Category | Features | Trial | Basic | Enterprise |
---|---|---|---|---|
Inference | OpenAI compatible APIs | |||
Optimized inference APIs | ||||
Long context (128K) handling | ||||
Function calling & JSON mode | ||||
Rate limit | 10 requests/min
50K tokens/min | 10K requests/min
100K tokens/min (Contact sales to increase limits) | Unlimited | |
Tools | Document parsing | |||
Web search | ||||
Code interpreter | ||||
Other built-in tools |
Inference
OpenAI compatible APIs
Optimized inference APIs
Long context (128K) handling
Function calling & JSON mode
Rate limit
Tools
Document parsing
Web search
Code interpreter
Other built-in tools
Pricing details
Important Update
Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.
Show details
Token-based Billing
Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a "token" refers to an individual unit processed by the model.
Time-based Billing
You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.
Token-based billing
Model name
$ / 1M tokens
DeepSeek-R1
Input
$3
Output
$7
Llama-3.3-70B-Instruct
$0.6
Llama-3.1-8B-Instruct
$0.1
Time-based billing
Model name
$ / second
Llama-4-Maverick-17B-128E-Instruct
$0.004
Llama-4-Scout-17B-16E-Instruct
$0.002
Qwen3-235B-A22B
$0.004
Qwen3-30B-A3B
$0.002
Qwen3-32B
$0.002
gemma-3-27b-it
$0.002
Mistral-Small-3.1-24B-Instruct-2503
$0.002
Devstral-Small-2505
$0.002
Magistral-Small-2506
$0.002