Friendli Serverless Endpoints offer a flexible, scalable inference solution powered by a wide range of models. You can unlock access to more models and features based on your usage tier.
Important Update: Effective June 20, 2025, we’ve introduced new billing options and plan changes:
  • Models are now billed Token-Based or Time-Based, depending on the model.
  • The Basic plan has been renamed to the Starter plan.
  • Existing users can continue using their current serverless models without interruption.

Usage Tiers

Usage tiers define your limits on usage and scale monthly based on your payment history.
TiersUsage LimitsRate Limit (RPM)Output Token LengthQualifications
Tier 1$50 / month1002K / 8K (if reasoning model)Valid payment method added
Tier 2$500 / month1,0004K / 8K (if reasoning model)Total historical spend of $50+
Tier 3$5,000 / month5,0008K / 16K (if reasoning model)Total historical spend of $500+
Tier 4$50,000 / month10,00016K / 32K (if reasoning model)Total historical spend of $5,000+
Tier 5CustomCustomCustomContact support@friendli.ai
Qualifications only apply to usage within the Serverless Endpoints plan.
‘Output Token Length’ is how much the model can write in response. It’s different from ‘Context Length’, which is sum of the input and output tokens.

Billing Methods

Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type.

Token-Based Billing

In a token-based billing model, charges are determined by the number of tokens processed, where each “token” represents an individual unit processed by the model.
Model CodePrice per Token
LGAI-EXAONE/EXAONE-4.0.1-32BInput $0.6 · Output $1 / 1M tokens
meta-llama/Llama-3.3-70B-Instruct$0.6 / 1M tokens
meta-llama/Llama-3.1-8B-Instruct$0.1 / 1M tokens

Time-Based Billing

In a time-based billing model, charges are determined by the compute time required to run your inference request, measured in milliseconds. Non-compute latencies, such as network delays or queueing time, are excluded—ensuring you are charged only for the actual model execution time.
A serverless endpoint model can be in either a Warm status, where it’s ready to handle requests instantly, or a Cold status, where it is inactive and requires time to start up.When a model in a cold status receives a request, it undergoes a “warm-up” process that typically takes 7-30 seconds, depending on the model’s size. During this period, requests will be queued, but this warm-up delay is not included in your billable compute time.
Model CodePrice per Second
skt/A.X-4.0$0.002 / second
skt/A.X-3.1$0.002 / second
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B$0.002 / second
deepseek-ai/DeepSeek-R1-0528$0.004 / second
meta-llama/Llama-4-Maverick-17B-128E-Instruct$0.004 / second
meta-llama/Llama-4-Scout-17B-16E-Instruct$0.002 / second
Qwen/Qwen3-235B-A22B-Thinking-2507$0.004 / second
Qwen/Qwen3-235B-A22B-Instruct-2507$0.004 / second
Qwen/Qwen3-30B-A3B$0.002 / second
Qwen/Qwen3-32B$0.002 / second
google/gemma-3-27b-it$0.002 / second
mistralai/Mistral-Small-3.1-24B-Instruct-2503$0.002 / second
mistralai/Devstral-Small-2505$0.002 / second
mistralai/Magistral-Small-2506$0.002 / second

Free Models

The following models are available for free for a limited time.
Model CodeFree until
K-intelligence/Midm-2.0-Base-InstructSeptember 4th
K-intelligence/Midm-2.0-Mini-InstructSeptember 4th

FAQs

For more questions, contact support@friendli.ai.