Intelligent Autoscaling

Our autoscaling system automatically adjusts computational resources based on your traffic patterns, helping you optimize both performance and costs.

How Autoscaling Works

  • Minimum Replicas:
    • When set to 0, the endpoint enters sleeping status during periods of inactivity, helping to minimize costs
    • When set to a value greater than 0, the endpoint maintains at least that number of active replicas at all times
  • Maximum Replicas: Defines the upper limit of replicas that can be created to handle increased traffic load
  • Cooldown Period: Measured in seconds; if no requests are received during this period, the endpoint transitions to sleeping status.

Autoscaling types

We highly recommend using the Default autoscaling type, as it performs stable for most workloads. Please note that performance degradation or unexpected charges may occur with other configurations without a proper understanding of your workload characteristics.
We provide 2 types of autoscaling, but only the Default option is available for non-Enterprise plans.
  • Default (Recommended): This is the best choice for the majority of users. It operates reliably across most workloads with no configuration required, leveraging our internal expertise to provide a balanced approach to performance and cost.
  • Request count (Enterprise plan only): This is an advanced option for users who have a deep understanding of their workload characteristics and require granular control over scaling behavior.
    • As users define the number of requests a single worker will handle, cost prediction becomes more straightforward and intuitive.
    • This method can serve as a foundation for implementing your own custom autoscaling logic by dynamically changing the threshold via an API, targeting custom metrics.

Benefits of Autoscaling

  • Cost Optimization: Pay only for the resources you need by automatically scaling to zero during idle periods
  • Performance Management: Handle traffic spikes efficiently by automatically adding replicas
  • Resource Efficiency: Maintain optimal resource utilization across varying workload patterns