Online Quantization

Skip the hassle of preparing a quantized model. By enabling Online Quantization, your model will be automatically quantized at runtime using Friendli’s proprietary method—preserving quality while improving speed and cost-efficiency.
This allows you to select lower-VRAM GPU instances without performance loss.
Some models (e.g., those already quantized) may not be compatible with Online Quantization.
In certain cases, specific GPU instance types may not be available when this option is enabled.