Fast and affordable API
for open-source LLMs and LMMs:
Friendli Serverless Endpoints


Try serverless endpoints for blazing fast response with powerful built-in tools

Sign up for free

250 tokens/sec at $0.1/1M tokens

Serverless Endpoints delivers output tokens at a staggering 250 tokens per second with per-token billing as low as $0.1 per million tokens for the Llama 3.1 8B model.

Supports 128K context length

Build complex applications that require in-depth understanding and context retention on Serverless Endpoints. Our Llama 3.1 endpoints support complete 128K context length handling.

Easily build AI agents with tool-assist

Are you building an AI agent that can search the web, integrate knowledge bases, and solve complex problems using many tools? Serverless Endpoints has it all.

SUPPORTED MODELS

DEEPSEEK R1

LLAMA 4 MAVERICK 17B 128E INSTRUCT

LLAMA 4 SCOUT 17B 16E INSTRUCT

LLAMA 3.3 70B INSTRUCT

LLAMA 3.1 8B INSTRUCT

QWEN3 235B A22B

QWEN3 30B A3B

QWEN3 32B

GEMMA 3 27B IT

MISTRAL SMALL 3.1 24B INSTRUCT 2503

DEVSTRAL SMALL 2505

MAGISTRAL SMALL 2506

Stay tuned for new model support

PRICING

Free trial

Sign up

Sign up and get free trial credits!

Basic

Sign up

Token-based billing

Model name

$ / 1M tokens

DeepSeek-R1

Input

$3

Output

$7

Llama-3.3-70B-Instruct

$0.6

Llama-3.1-8B-Instruct

$0.1

Time-based billing

Model name

$ / second

Llama-4-Maverick-17B-128E-Instruct

$0.004

Llama-4-Scout-17B-16E-Instruct

$0.002

Qwen3-235B-A22B

$0.004

Qwen3-30B-A3B

$0.002

Qwen3-32B

$0.002

gemma-3-27b-it

$0.002

Mistral-Small-3.1-24B-Instruct-2503

$0.002

Devstral-Small-2505

$0.002

Magistral-Small-2506

$0.002