Pricing build to scale with your growth

Fast, reliable, and affordable inference at any scale. Get started instantly with self-serve, or contact us for enterprise deployments.

Serverless endpoints

Run the fastest frontier model inference with a simple API call.

See pricing

Dedicated endpoints

Run dedicated inference with unmatched speed and reliability at scale.

See pricing

Container

Run inference with full control and performance in your environment.

Contact us

Serverless API Pricing

Get instant access to the fastest frontier model inference with a simple API call.

Text and vision

Pay per token or GPU time

Model

$ / 1M tokens

EXAONE-4.0.1-32B

$0.6 input, $1 output

Llama-3.1-8B-Instruct

$0.1

Llama-3.3-70B-Instruct

$0.6

Qwen3-235B-A22B-Instruct-2507

$0.2 input, $0.8 output

Model

$ / second

Mistral-Small-3.1-24B-Instruct-2503

$0.002

Magistral-Small-2506

$0.002

Llama-4-Scout-17B-16E-Instruct

$0.002

gemma-3-27b-it

$0.002

Devstral-Small-2505

$0.002

Qwen3-32B

$0.002

Qwen3-30B-A3B

$0.002

A.X-3.1

$0.002

HyperCLOVAX-SEED-Think-14B

$0.002

A.X-4.0

$0.002

Llama-4-Maverick-17B-128E-Instruct

$0.004

DeepSeek-R1-0528

$0.004

Qwen3-235B-A22B-Thinking-2507

$0.004

GLM-4.6

$0.004

Discounts for prompt caching are available for enterprise deployments. Contact us to learn more.

Dedicated Endpoints Pricing

Get instant access to the fastest frontier model inference with a simple API call.

Basic

Get started with:

  • Pay-as-you-go
  • On-demand GPUs
  • Support for custom, fine-tuned, and open-source models
  • Automatic traffic-based scaling
  • Real-time performance, usage, and log visibility
  • Zero-downtime model updates
  • Multi-LoRA support
  • SOC2 compliance
  • Email and in-app chat support
Get started

Enterprise

Everything in Basic, plus:

  • Reserved GPUs
  • Priority access to high-demand GPU types
  • Hands-on engineering expertise
  • Dedicated Slack support
  • VPC and on-prem deployment options
  • Enterprise-grade security and compliance
  • Custom global region deployment
  • 99.99% availability SLAs
  • Discounts on monthly reserved GPUs
Contact us

On-demand deployment

Only pay for the compute you use, down to the second, with no extra charges for start-up times

GPU Type

$ / hour (billed per second)

A100 80GB GPU

$2.9

H100 80GB GPU

$3.9

H200 141GB GPU

$4.5

B200 192GB GPU

$8.9

Results vary by use case, but we often observe 2-3x higher throughput and faster speed on FriendliAI compared to open source inference engines.

Container Pricing

Run inference with full control and performance in your environment.

Contact us

Explore FriendliAI today