Skip to main content
Serverless Endpoints are often more economical, with access to a wide range of models. Pricing varies by model type—text models are charged by processed tokens or compute time of your request, while audio models are charged by the duration of processed audio.

Tier-Based API Rate Limits

Tiers are based on lifetime spending and update automatically. As your usage grows, your tier increases. Or you can move up instantly by purchasing additional credits.
TiersQualificationsRPM (paid model)RPM (free model)Output Token Length
Tier 0Signed upAdaptive Rate Limits*Adaptive Rate Limits*8K
Tier 1Total historical spend of $10+1006016K
Tier 2Total historical spend of $50+1,0001,00016K
Tier 3Total historical spend of $500+5,0005,00032K
Tier 4Total historical spend of $5,000+10,00010,00064K
Tier 5Contact support@friendli.aiCustomCustomCustom
*Adaptive Rate Limits: Rate limits are applied dynamically based on overall platform conditions.
‘Output Token Length’ is how much the model can write in response. It’s different from ‘Context Length’, which is sum of the input and output tokens.

Billing Methods

Text Models

Text models use a token-based billing method, depending on the model.

Token-Based Billing

In a token-based billing model, charges are determined by the number of tokens processed, where each “token” represents an individual unit processed by the model.
Model CodePrice per Token
LGAI-EXAONE/K-EXAONE-236B-A23BInput $0.2 · Cached Input $0.1 · Output $0.8 / 1M tokens
MiniMaxAI/MiniMax-M2.5Input $0.3 · Cached Input $0.06 · Output $1.2 / 1M tokens
MiniMaxAI/MiniMax-M2.1Input $0.3 · Cached Input $0.15 · Output $1.2 / 1M tokens
zai-org/GLM-5Input $1 · Cached Input $0.5 · Output $3.2 / 1M tokens
zai-org/GLM-4.7Input $0.6 · Output $2.2 / 1M tokens
meta-llama/Llama-3.3-70B-Instruct$0.6 / 1M tokens
meta-llama/Llama-3.1-8B-Instruct$0.1 / 1M tokens
Qwen/Qwen3-235B-A22B-Instruct-2507Input $0.2 · Output $0.8 / 1M tokens
Qwen/Qwen3-30B-A3BInput $0.15 · Output $0.6 / 1M tokens
deepseek-ai/DeepSeek-V3.2Input $0.5 · Cached Input $0.25 · Output $1.5 / 1M tokens
deepseek-ai/DeepSeek-V3.1Input $0.5 · Cached Input $0.25 · Output $1.5 / 1M tokens

Audio Models

Audio models are charged based on the duration of processed audio. Charges are calculated per second and aggregated into a per-minute rate for clarity.
Model CodePrice per Audio Minute
openai/whisper-large-v3$0.0015 / audio minute

FAQs

Your usage tier, which determines your rate limits, increases monthly based on your proof-of-payment. Need a faster upgrade? Reach out anytime at support@friendli.ai — we’re happy to help!
You’ll receive an alert when approaching your monthly cap. Please contact support@friendli.ai to discuss options for increasing your monthly cap. We may help you (1) pay early to reset your monthly cap, or (2) upgrade your plan to increase your monthly cap and unlock more features.
For more questions, contact support@friendli.ai.