> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Request Queueing

> Learn how to configure request queueing to keep performance predictable when a Friendli Dedicated Endpoint receives more traffic than it can serve.

Request queueing keeps your endpoint's performance predictable even under traffic that exceeds its capacity. Instead of accepting every request and overloading the endpoint, it holds excess requests in a queue, so the requests being processed maintain consistent performance.

## Queueing Threshold

A queueing threshold defines the capacity at which an endpoint starts queueing requests. Friendli Dedicated Endpoints currently support one threshold type:

* **Request count**: The average number of in-flight requests each replica should handle. The endpoint multiplies this value by the current number of running replicas to get its capacity. Once the number of in-flight requests exceeds that capacity, the extra requests are queued rather than routed to a replica.

<Note>
  The appropriate threshold depends on your model, GPU instance, and workload characteristics, so tune it to the point where the endpoint meets your target performance.
</Note>

## Queue Timeout

Queue timeout is optional and controls how long a request can stay in the queue:

* **Not set**: Queued requests wait until capacity frees up.
* **Set**: When a queued request has waited longer than the timeout, the endpoint returns a `429 Too Many Requests` response, so the client can retry or fall back.

Set a queue timeout when you would rather reject a request than let it wait too long.

## Set Up Request Queueing

<Steps>
  <Step title="Find the Endpoint Features Section">
    While creating or updating an endpoint, go to **Endpoint Features**.
  </Step>

  <Step title="Enable Request Queueing">
    Turn on **Request Queueing**.
  </Step>

  <Step title="Configure the Threshold and Timeout">
    Enter a **Request Count Threshold** (minimum 1) and, optionally, a **Queue timeout** in seconds. Leave the timeout empty or set it to 0 for **No Limit**.
  </Step>

  <Step title="Deploy or Update the Endpoint">
    Click **Deploy** for a new endpoint, or **Update** to apply changes to an existing one.
  </Step>
</Steps>
