Skip to main content

Documentation Index

Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Online Quantization

Skip the hassle of preparing a quantized model. When you enable Online Quantization, Friendli automatically quantizes your model to the target precision at runtime using a proprietary method—preserving quality while improving speed and cost-efficiency. We currently support two precision levels, 4BIT and 8BIT.
This allows you to select lower-VRAM GPU instances without performance loss.
Some models (e.g., those already quantized) may not be compatible with Online Quantization.
Not all models support all target precisions. Some may only support 8BIT.
In certain cases, specific GPU instance types may not be available when this option is enabled.
Last modified on May 12, 2026