Pricing

Friendli Serverless Endpoints offer a flexible, scalable inference solution powered by a wide range of models. You can unlock access to more models and features based on your usage tier.

Important Update: Effective June 20, 2025, we’ve introduced new billing options and plan changes:

Models are now billed Token-Based or Time-Based, depending on the model.
The Basic plan has been renamed to the Starter plan.
Existing users can continue using their current serverless models without interruption.

Usage Tiers

Usage tiers define your limits on usage and scale monthly based on your payment history.

Tiers	Usage Limits	Rate Limit (RPM)	Output Token Length	Qualifications
Tier 1	$50 / month	100	2K	Valid payment method added
Tier 2	$500 / month	1,000	4K	Total historical spend of $50+
Tier 3	$5,000 / month	5,000	8K	Total historical spend of $500+
Tier 4	$50,000 / month	10,000	16K	Total historical spend of $5,000+
Tier 5	Custom	Custom	Custom	Contact support@friendli.ai

Qualifications only apply to usage within the Serverless Endpoints plan.

‘Output Token Length’ is how much the model can write in response. It’s different from ‘Context Length’, which is sum of the input and output tokens.

Billing Methods

Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type.

Token-Based Billing

Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a “token” refers to an individual unit processed by the model.

Time-Based Billing

Other models use time-based billing, meaning you are charged per second of compute time used to run your inference request.

Free Models

The following models are available for free for a limited time.

Model Code	Free until
K-intelligence/Midm-2.0-Base-Instruct	August 4th
K-intelligence/Midm-2.0-Mini-Instruct	August 4th

Pinned Models (Token-Based Billing)

The following pinned popular models are billed per token:

Model Code	Price per Token
LGAI-EXAONE/EXAONE-4.0-32B	$1 / 1M tokens
meta-llama/Llama-3.3-70B-Instruct	$0.6 / 1M tokens
meta-llama/Llama-3.1-8B-Instruct	$0.1 / 1M tokens

Other Models (Time-Based Billing)

Other models are billed per second of compute time:

Model Code	Price per Second
deepseek-ai/DeepSeek-R1-0528	$0.004 / second
meta-llama/Llama-4-Maverick-17B-128E-Instruct	$0.004 / second
meta-llama/Llama-4-Scout-17B-16E-Instruct	$0.002 / second
Qwen/Qwen3-235B-A22B	$0.004 / second
Qwen/Qwen3-30B-A3B	$0.002 / second
Qwen/Qwen3-32B	$0.002 / second
google/gemma-3-27b-it	$0.002 / second
mistralai/Mistral-Small-3.1-24B-Instruct-2503	$0.002 / second
mistralai/Devstral-Small-2505	$0.002 / second
mistralai/Magistral-Small-2506	$0.002 / second

FAQs

How do I increase my rate limits?

Do I need to upgrade my plan to use popular models?

What if I exceed my monthly cap?

For more questions, contact support@friendli.ai.

Get Started

Core Concepts

Products

Usage Tiers

Billing Methods

Token-Based Billing

Time-Based Billing

Free Models

Pinned Models (Token-Based Billing)

Other Models (Time-Based Billing)

FAQs

Get Started

Core Concepts

Products

​Usage Tiers

​Billing Methods

​Token-Based Billing

​Time-Based Billing

​Free Models

​Pinned Models (Token-Based Billing)

​Other Models (Time-Based Billing)

​FAQs

Usage Tiers

Billing Methods

Token-Based Billing

Time-Based Billing

Free Models

Pinned Models (Token-Based Billing)

Other Models (Time-Based Billing)

FAQs