Supported Instance Types
| GPU Type | Basic | Enterprise |
|---|---|---|
| A100 80GB | $2.9 / hour | Contact sales |
| H100 80GB | $3.9 / hour | Contact sales |
| H200 141GB | $4.5 / hour | Contact sales |
| B200 192GB | $8.9 / hour | Contact sales |
Contact sales for a discounted custom pricing plan for your enterprise.
How does billing work for Dedicated Endpoints?
- User is billed by GPU-second for the duration that the endpoint is active
- Charges begin when the endpoint is up and running
- Costs accumulate even when the endpoint is not serving API calls
- When an endpoint goes to sleep after being idle, charges will no longer accrue
- Updating an endpoint or employing certain settings keep the endpoint active and charges will continue to accrue
How does autoscaling affect my costs?
Each additional replica increases your total cost proportionally. For example, scaling from 1 to 2 replicas doubles your GPU costs.Best practices for cost management:
- Regularly monitor active endpoints
- If the endpoint isn’t receiving API calls regularly, set the minimum replica count to 0 to enable sleeping (endpoint will automatically wake up upon receiving API calls)
- Delete unused/deprecated endpoints to avoid unnecessary costs