> ## Documentation Index
> Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Online Quantization

> Automatically quantize models to 4-bit or 8-bit precision at deploy time on Friendli Dedicated Endpoints. No pre-quantized checkpoint needed.

Online Quantization quantizes your model at runtime using FriendliAI's proprietary method, improving speed and reducing cost with little to no loss in accuracy. This lets you select lower-VRAM GPU instances without sacrificing performance.

You can configure the precision level with the following options:

* **Off**: Serve the model at its original precision.
* **4-bit**: Quantize to 4-bit precision for the largest savings in memory and cost.
* **8-bit**: Quantize to 8-bit precision for a balance between savings and accuracy.

<Note>
  Some models (e.g., those already quantized) may not be compatible with Online Quantization.
  Not all models support all target precisions. Some may only support 8-bit.\
  In certain cases, specific GPU instance types may not be available when this option is enabled.
</Note>
