Pricing build to scale with your growth
Fast, reliable, and affordable inference at any scale. Get started instantly with self-serve, or contact us for enterprise deployments.
Serverless Endpoints
Run the fastest frontier model inference with a simple API call.
Dedicated Endpoints
Run dedicated inference with unmatched speed and reliability at scale.
Container
Run inference with full control and performance in your environment.
Looking for Enterprise Capabilities?
The Enterprise plan is offered as a customizable framework, not a fixed bundle.
Features and capabilities are enabled based on your contract.
Let’s talk about an Enterprise plan designed for you.
Scale & Reliability
- Custom Serverless API rate limits
- Priority access to high-demand GPU types
- Reserved GPU capacity
Control & Deployment
- Custom region deployments
- VPC deployments
- On-prem deployment options
Enterprise Commitments
- Dedicated support channels
- Named Customer Success ownership
- Custom commercial terms
Serverless API Pricing
Get instant access to the fastest frontier model inference with a simple API call.
Dedicated Endpoints Pricing
Get instant access to the fastest frontier model inference with a simple API call.
On-demand deployment
Only pay for the compute you use, down to the second, with no extra charges for start-up times
GPU Type
$ / hour (billed per second)
A100 80GB GPU
$2.9
H100 80GB GPU
$3.9
H200 141GB GPU
$4.5
B200 180GB GPU
$8.9
For estimates of per-token prices, see this page. Results vary by use case, but we often observe 2-3x higher throughput and faster speed on FriendliAI compared to open source inference engines.
Container Pricing
Run inference with full control and performance in your environment.