Pricing
Find the best product for you
Friendli Serverless Endpoints
Fast and affordable API for open-source models
Compare plans and features
Category | Features | Starter | Enterprise |
---|---|---|---|
Inference | OpenAI compatible APIs | ||
Optimized inference APIs | |||
Long context (128K) handling | |||
Function calling & JSON mode | |||
Rate limit | Varies with tier See tier details | Custom | |
Tools | Document parsing | ||
Web search | |||
Code interpreter | |||
Other built-in tools | |||
Tiers See tier details | Minimum usage tier | Tier 1+ | Custom |
OpenAI compatible APIs
Optimized inference APIs
Long context (128K) handling
Function calling & JSON mode
Rate limit
Varies with tier
See tier detailsDocument parsing
Web search
Code interpreter
Other built-in tools
Tiers
See tier detailsMinimum usage tier
Pricing details
Important Update
Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.
Show details
Token-based Billing
Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a "token" refers to an individual unit processed by the model.
Time-based Billing
You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.
Token-based billing
Model name
$ / 1M tokens
DeepSeek-R1
Input
$3
Output
$7
Llama-3.3-70B-Instruct
$0.6
Llama-3.1-8B-Instruct
$0.1
Time-based billing
Model name
$ / second
Llama-4-Maverick-17B-128E-Instruct
$0.004
Llama-4-Scout-17B-16E-Instruct
$0.002
Qwen3-235B-A22B
$0.004
Qwen3-30B-A3B
$0.002
Qwen3-32B
$0.002
gemma-3-27b-it
$0.002
Mistral-Small-3.1-24B-Instruct-2503
$0.002
Devstral-Small-2505
$0.002
Magistral-Small-2506
$0.002