Tier-Based API Rate Limits
Tiers are based on lifetime spending and update automatically. As your usage grows, your tier increases. Or you can move up instantly by purchasing additional credits.| Tiers | Qualifications | RPM (paid model) | RPM (free model) | Output Token Length | Usage Limits |
|---|---|---|---|---|---|
| Tier 0 | Signed up | Adaptive Rate Limits* | Adaptive Rate Limits* | 4K | Limited to the free credit issued at sign-up |
| Tier 1 | Valid payment method added | 100 | 60 | 16K | $50 / month |
| Tier 2 | Total historical spend of $50+ | 1,000 | 1,000 | 16K | $500 / month |
| Tier 3 | Total historical spend of $500+ | 5,000 | 5,000 | 32K | $5,000 / month |
| Tier 4 | Total historical spend of $5,000+ | 10,000 | 10,000 | 64K | $50,000 / month |
| Tier 5 | Contact [email protected] | Custom | Custom | Custom | Custom |
*Adaptive Rate Limits: Rate limits are applied dynamically based on overall platform conditions.
Billing Methods
Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type.Token-Based Billing
In a token-based billing model, charges are determined by the number of tokens processed, where each “token” represents an individual unit processed by the model.| Model Code | Price per Token |
|---|---|
| MiniMaxAI/MiniMax-M2.1 | Input $0.3 · Output $1.2 / 1M tokens |
| zai-org/GLM-4.7 | Input $0.6 · Output $2.2 / 1M tokens |
| LGAI-EXAONE/EXAONE-4.0.1-32B | Input $0.6 · Output $1 / 1M tokens |
| meta-llama/Llama-3.3-70B-Instruct | $0.6 / 1M tokens |
| meta-llama/Llama-3.1-8B-Instruct | $0.1 / 1M tokens |
| Qwen/Qwen3-235B-A22B-Instruct-2507 | Input $0.2 · Output $0.8 / 1M tokens |
Time-Based Billing
In a time-based billing model, charges are determined by the compute time required to run your inference request, measured in milliseconds. Non-compute latencies, such as network delays or queueing time, are excluded—ensuring you are charged only for the actual model execution time.A serverless endpoint model can be in either a Warm status, where it’s ready to handle requests instantly, or a Cold status, where it is inactive and requires time to start up.When a model in a cold status receives a request, it undergoes a “warm-up” process that typically takes 7-30 seconds, depending on the model’s size.
During this period, requests will be queued, but this warm-up delay is not included in your billable compute time.
| Model Code | Price per Second |
|---|---|
| zai-org/GLM-4.6 | $0.004 / second |
| deepseek-ai/DeepSeek-V3.1 | $0.004 / second |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct | $0.004 / second |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | $0.002 / second |
| Qwen/Qwen3-235B-A22B-Thinking-2507 | $0.004 / second |
| Qwen/Qwen3-30B-A3B | $0.002 / second |
| Qwen/Qwen3-32B | $0.002 / second |
Free Models
The following models are available for free for a limited time.| Model Code | Free until |
|---|---|
| LGAI-EXAONE/K-EXAONE-236B-A23B | February 12th |
FAQs
How do I increase my rate limits?
How do I increase my rate limits?
Your usage tier, which determines your rate limits, increases monthly based on your proof-of-payment. Need a faster upgrade? Reach out anytime at [email protected] — we’re happy to help!
Do I need to upgrade my plan to use popular models?
Do I need to upgrade my plan to use popular models?
Popular models are available to all users, depending on the limits determined by their usage tiers.
What if I exceed my monthly cap?
What if I exceed my monthly cap?
You’ll receive an alert when approaching your monthly cap. Please contact [email protected] to discuss options for increasing your monthly cap. We may help you (1) pay early to reset your monthly cap, or (2) upgrade your plan to increase your monthly cap and unlock more features.
For more questions, contact [email protected].