Autoscaling

Intelligent Autoscaling
How Autoscaling Works
Autoscaling types
Benefits of Autoscaling

Intelligent Autoscaling

Our autoscaling system automatically adjusts computational resources based on your traffic patterns, helping you optimize both performance and costs.

How Autoscaling Works

Minimum Replicas:
- When set to 0, the endpoint enters sleeping status during periods of inactivity, helping to minimize costs
- When set to a value greater than 0, the endpoint maintains at least that number of active replicas at all times
Maximum Replicas: Defines the upper limit of replicas that can be created to handle increased traffic load
Cooldown Period: Measured in seconds; if no requests are received during this period, the endpoint transitions to sleeping status.

Autoscaling types

We highly recommend using the Default autoscaling type, as it performs stable for most workloads. Please note that performance degradation or unexpected charges may occur with other configurations without a proper understanding of your workload characteristics.

Default (Recommended): This is the best choice for the majority of users. It operates reliably across most workloads with no configuration required, leveraging our internal expertise to provide a balanced approach to performance and cost.
Request count: This is an advanced option for users who have a deep understanding of their workload characteristics and require granular control over scaling behavior.
- As users define the number of requests a single worker will handle, cost prediction becomes more straightforward and intuitive.
- This method can serve as a foundation for implementing your own custom autoscaling logic by dynamically changing the threshold via an API, targeting custom metrics.

Benefits of Autoscaling

Cost Optimization: Pay only for the resources you need by automatically scaling to zero during idle periods
Performance Management: Handle traffic spikes efficiently by automatically adding replicas
Resource Efficiency: Maintain optimal resource utilization across varying workload patterns

Endpoints Online Quantization

⌘I

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

Intelligent Autoscaling

How Autoscaling Works

Autoscaling types

Benefits of Autoscaling

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

​Intelligent Autoscaling

​How Autoscaling Works

​Autoscaling types

​Benefits of Autoscaling

Intelligent Autoscaling

How Autoscaling Works

Autoscaling types

Benefits of Autoscaling