Fast and affordable API
for open-source LLMs and LMMs:Friendli Serverless Endpoints
Try serverless endpoints for blazing fast response with powerful built-in tools
Sign up for free250 tokens/sec at $0.1/1M tokens
Serverless Endpoints delivers output tokens at a staggering 250 tokens per second with per-token billing as low as $0.1 per million tokens for the Llama 3.1 8B model.
Supports 128K context length
Build complex applications that require in-depth understanding and context retention on Serverless Endpoints. Our Llama 3.1 endpoints support complete 128K context length handling.
Easily build AI agents with tool-assist
Are you building an AI agent that can search the web, integrate knowledge bases, and solve complex problems using many tools? Serverless Endpoints has it all.
DEEPSEEK R1
LLAMA 4 MAVERICK 17B 128E INSTRUCT
LLAMA 4 SCOUT 17B 16E INSTRUCT
LLAMA 3.3 70B INSTRUCT
LLAMA 3.1 8B INSTRUCT
QWEN3 235B A22B
QWEN3 30B A3B
QWEN3 32B
GEMMA 3 27B IT
MISTRAL SMALL 3.1 24B INSTRUCT 2503
DEVSTRAL SMALL 2505
MAGISTRAL SMALL 2506
Stay tuned for new model support
Free trial
Sign upSign up and get free trial credits!
Basic
Sign upToken-based billing
Model name
$ / 1M tokens
DeepSeek-R1
Input
$3
Output
$7
Llama-3.3-70B-Instruct
$0.6
Llama-3.1-8B-Instruct
$0.1
Time-based billing
Model name
$ / second
Llama-4-Maverick-17B-128E-Instruct
$0.004
Llama-4-Scout-17B-16E-Instruct
$0.002
Qwen3-235B-A22B
$0.004
Qwen3-30B-A3B
$0.002
Qwen3-32B
$0.002
gemma-3-27b-it
$0.002
Mistral-Small-3.1-24B-Instruct-2503
$0.002
Devstral-Small-2505
$0.002
Magistral-Small-2506
$0.002