Pricing
Friendli Serverless Endpoints offer a range of models tailored to various tasks.
Friendli Serverless Endpoints offer a range of models tailored to various tasks.
Important Update: Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.
Billing Methods
Friendli Serverless Endpoints use two different billing methods depending on the model type:
Token-based Billing
Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a “token” refers to an individual unit processed by the model.
Time-based Billing
You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.
Token-based Pricing Models
The following table shows the pricing details for models that use token-based billing. These models are charged based on the number of tokens processed.
Model Code | Price per Token |
---|---|
deepseek-ai/DeepSeek-R1 | Input $3 · Output $7 / 1M tokens |
meta-llama/Llama-3.3-70B-Instruct | $0.6 / 1M tokens |
meta-llama/Llama-3.1-8B-Instruct | $0.1 / 1M tokens |
Time-based Pricing Models
The following table shows the pricing details for models that use time-based billing. These models are charged according to the compute time (in seconds) used for inference.
Model Code | Price per Second |
---|---|
meta-llama/Llama-4-Maverick-17B-128E-Instruct | $0.004 / Second |
meta-llama/Llama-4-Scout-17B-16E-Instruct | $0.002 / Second |
Qwen/Qwen3-235B-A22B | $0.004 / Second |
Qwen/Qwen3-30B-A3B | $0.002 / Second |
Qwen/Qwen3-32B | $0.002 / Second |
google/gemma-3-27b-it | $0.002 / Second |
mistralai/Mistral-Small-3.1-24B-Instruct-2503 | $0.002 / Second |
mistralai/Devstral-Small-2505 | $0.002 / Second |
mistralai/Magistral-Small-2506 | $0.002 / Second |