Pricing build to scale with your growth

Fast, reliable, and affordable inference at any scale. Get started instantly with self-serve, or contact us for enterprise deployments.

Serverless Endpoints

Run the fastest frontier model inference with a simple API call.

See pricing

Dedicated Endpoints

Run dedicated inference with unmatched speed and reliability at scale.

See pricing

Container

Run inference with full control and performance in your environment.

Contact us

Looking for Enterprise Capabilities?

The Enterprise plan is offered as a customizable framework, not a fixed bundle.
Features and capabilities are enabled based on your contract.
Let’s talk about an Enterprise plan designed for you.

Scale & Reliability

  • Custom Serverless API rate limits
  • Priority access to high-demand GPU types
  • Reserved GPU capacity

Control & Deployment

  • Custom region deployments
  • VPC deployments
  • On-prem deployment options

Enterprise Commitments

  • Dedicated support channels
  • Named Customer Success ownership
  • Custom commercial terms
Contact Sales

Serverless API Pricing

Get instant access to the fastest frontier model inference with a simple API call.

Dedicated Endpoints Pricing

Get instant access to the fastest frontier model inference with a simple API call.

On-demand deployment

Only pay for the compute you use, down to the second, with no extra charges for start-up times

GPU Type

$ / hour (billed per second)

A100 80GB GPU

$2.9

H100 80GB GPU

$3.9

H200 141GB GPU

$4.5

B200 180GB GPU

$8.9

For estimates of per-token prices, see this page. Results vary by use case, but we often observe 2-3x higher throughput and faster speed on FriendliAI compared to open source inference engines.

Container Pricing

Run inference with full control and performance in your environment.

Contact us

Explore FriendliAI today