Pricing
Friendli Serverless Endpoints offer a range of models tailored to various tasks.
Friendli Serverless Endpoints offer a flexible, scalable inference solution powered by a wide range of models. You can unlock access to more models and features based on your usage tier.
Important Update: Effective June 20, 2025, we’ve introduced new billing options and plan changes:
- Models are now billed Token-Based or Time-Based, depending on the model.
- The Basic plan has been renamed to the Starter plan.
- Existing users can continue using their current serverless models without interruption.
Usage Tiers
Usage tiers define your limits on usage and scale monthly based on your payment history.
Tiers | Usage Limits | Rate Limit (RPM) | Output Token Length | Qualifications |
---|---|---|---|---|
Tier 1 | $50 / month | 100 | 2K | Valid payment method added |
Tier 2 | $500 / month | 1,000 | 4K | Total historical spend of $50+ |
Tier 3 | $5,000 / month | 5,000 | 8K | Total historical spend of $500+ |
Tier 4 | $50,000 / month | 10,000 | 16K | Total historical spend of $5,000+ |
Tier 5 | Custom | Custom | Custom | Contact support@friendli.ai |
Qualifications only apply to usage within the Serverless Endpoints plan.
‘Output Token Length’ is how much the model can write in response. It’s different from ‘Context Length’, which is sum of the input and output tokens.
Billing Methods
Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type.
Token-Based Billing
Pinned models (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a “token” refers to an individual unit processed by the model.
Time-Based Billing
Other models use time-based billing, meaning you are charged per second of compute time used to run your inference request.
Free Models
The following models are available for free for a limited time.
Model Code | Free until |
---|---|
K-intelligence/Midm-2.0-Base-Instruct | August 4th |
K-intelligence/Midm-2.0-Mini-Instruct | August 4th |
Pinned Models (Token-Based Billing)
The following pinned popular models are billed per token:
Model Code | Price per Token |
---|---|
deepseek-ai/DeepSeek-R1 | Input $3 · Output $7 / 1M tokens |
meta-llama/Llama-3.3-70B-Instruct | $0.6 / 1M tokens |
meta-llama/Llama-3.1-8B-Instruct | $0.1 / 1M tokens |
Other Models (Time-Based Billing)
Other models are billed per second of compute time:
Model Code | Price per Second |
---|---|
meta-llama/Llama-4-Maverick-17B-128E-Instruct | $0.004 / second |
meta-llama/Llama-4-Scout-17B-16E-Instruct | $0.002 / second |
Qwen/Qwen3-235B-A22B | $0.004 / second |
Qwen/Qwen3-30B-A3B | $0.002 / second |
Qwen/Qwen3-32B | $0.002 / second |
google/gemma-3-27b-it | $0.002 / second |
mistralai/Mistral-Small-3.1-24B-Instruct-2503 | $0.002 / second |
mistralai/Devstral-Small-2505 | $0.002 / second |
mistralai/Magistral-Small-2506 | $0.002 / second |
FAQs
How do I increase my rate limits?
How do I increase my rate limits?
Your usage tier, which determines your rate limits, increases monthly based on your proof-of-payment. Need a faster upgrade? Reach out anytime at support@friendli.ai — we’re happy to help!
Do I need to upgrade my plan to use popular models?
Do I need to upgrade my plan to use popular models?
Popular models are available to all users, depending on the limits determined by their usage tiers.
What if I exceed my monthly cap?
What if I exceed my monthly cap?
You’ll receive an alert when approaching your monthly cap. Please contact support@friendli.ai to discuss options for increasing your monthly cap. We may help you (1) pay early to reset your monthly cap, or (2) upgrade your plan to increase your monthly cap and unlock more features.
For more questions, contact support@friendli.ai.