Skip to main content
Serverless Endpoints are often more economical, with access to a wide range of models. You can pay for the tokens generated or the compute time of your request, depending on the model.

Tier-Based API Rate Limits

Tiers are based on lifetime spending and update automatically. As your usage grows, your tier increases. Or you can move up instantly by purchasing additional credits.
TiersQualificationsRPM (paid model)RPM (free model)Output Token Length
Tier 0Signed upAdaptive Rate Limits*Adaptive Rate Limits*8K
Tier 1Total historical spend of $10+1006016K
Tier 2Total historical spend of $50+1,0001,00016K
Tier 3Total historical spend of $500+5,0005,00032K
Tier 4Total historical spend of $5,000+10,00010,00064K
Tier 5Contact support@friendli.aiCustomCustomCustom
*Adaptive Rate Limits: Rate limits are applied dynamically based on overall platform conditions.
‘Output Token Length’ is how much the model can write in response. It’s different from ‘Context Length’, which is sum of the input and output tokens.

Billing Methods

Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type.

Token-Based Billing

In a token-based billing model, charges are determined by the number of tokens processed, where each “token” represents an individual unit processed by the model.
Model CodePrice per Token
MiniMaxAI/MiniMax-M2.5Input $0.3 · Output $1.2 / 1M tokens
MiniMaxAI/MiniMax-M2.1Input $0.3 · Output $1.2 / 1M tokens
zai-org/GLM-5Input $1 · Output $3.2 / 1M tokens
zai-org/GLM-4.7Input $0.6 · Output $2.2 / 1M tokens
meta-llama/Llama-3.3-70B-Instruct$0.6 / 1M tokens
meta-llama/Llama-3.1-8B-Instruct$0.1 / 1M tokens
Qwen/Qwen3-235B-A22B-Instruct-2507Input $0.2 · Output $0.8 / 1M tokens
LGAI-EXAONE/EXAONE-4.0.1-32BInput $0.6 · Output $1 / 1M tokens

Time-Based Billing

In a time-based billing model, charges are determined by the compute time required to run your inference request, measured in milliseconds. Non-compute latencies, such as network delays or queueing time, are excluded—ensuring you are charged only for the actual model execution time.
A serverless endpoint model can be in either a Warm status, where it’s ready to handle requests instantly, or a Cold status, where it is inactive and requires time to start up.When a model in a cold status receives a request, it undergoes a “warm-up” process that typically takes 7-30 seconds, depending on the model’s size. During this period, requests will be queued, but this warm-up delay is not included in your billable compute time.
Model CodePrice per Second
zai-org/GLM-4.6$0.004 / second
meta-llama/Llama-4-Maverick-17B-128E-Instruct$0.004 / second
meta-llama/Llama-4-Scout-17B-16E-Instruct$0.002 / second
Qwen/Qwen3-235B-A22B-Thinking-2507$0.004 / second
Qwen/Qwen3-30B-A3B$0.002 / second
Qwen/Qwen3-32B$0.002 / second
deepseek-ai/DeepSeek-V3.1$0.004 / second

FAQs

Your usage tier, which determines your rate limits, increases monthly based on your proof-of-payment. Need a faster upgrade? Reach out anytime at support@friendli.ai — we’re happy to help!
You’ll receive an alert when approaching your monthly cap. Please contact support@friendli.ai to discuss options for increasing your monthly cap. We may help you (1) pay early to reset your monthly cap, or (2) upgrade your plan to increase your monthly cap and unlock more features.
For more questions, contact support@friendli.ai.