Friendli Serverless Endpoints offer a range of models tailored to various tasks.

Important Update: Effective June 20, we’ve introduced new billing options and plan changes. Models are now billed Token-based or Time-based, depending on the specific model.

Billing Methods

Friendli Serverless Endpoints use two different billing methods depending on the model type:

Token-based Billing

Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a “token” refers to an individual unit processed by the model.

Time-based Billing

You may encounter other models besides the pinned models, and they are charged timely. These models are billed based on the duration of compute time used for inference.

Token-based Pricing Models

The following table shows the pricing details for models that use token-based billing. These models are charged based on the number of tokens processed.

Model CodePrice per Token
deepseek-ai/DeepSeek-R1Input $3 · Output $7 / 1M tokens
meta-llama/Llama-3.3-70B-Instruct$0.6 / 1M tokens
meta-llama/Llama-3.1-8B-Instruct$0.1 / 1M tokens

Time-based Pricing Models

The following table shows the pricing details for models that use time-based billing. These models are charged according to the compute time (in seconds) used for inference.

Model CodePrice per Second
meta-llama/Llama-4-Maverick-17B-128E-Instruct$0.004 / Second
meta-llama/Llama-4-Scout-17B-16E-Instruct$0.002 / Second
Qwen/Qwen3-235B-A22B$0.004 / Second
Qwen/Qwen3-30B-A3B$0.002 / Second
Qwen/Qwen3-32B$0.002 / Second
google/gemma-3-27b-it$0.002 / Second
mistralai/Mistral-Small-3.1-24B-Instruct-2503$0.002 / Second
mistralai/Devstral-Small-2505$0.002 / Second
mistralai/Magistral-Small-2506$0.002 / Second