Qwen3.6-35B-A3B-NVFP4 API & Inference Endpoint

Quantization Details

This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.

Table with columns: Property, Value
Property	Value
Base model	Qwen/Qwen3.6-35B-A3B
Quant method	NVIDIA ModelOpt (`NVFP4`)
Weights	4-bit float (`group_size: 16`)
Input activation	4-bit float (`group_size: 16`)
Excluded layers	`lm_head`, `conv1d`, `shared_expert_gate`

Quickstart

You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.

Serving with SGLang

Ensure you have SGLang installed. Launch the server using the following command:

bash
sglang serve \
  --model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \
  --reasoning-parser qwen3 \
  --tensor-parallel-size 1 \
  --tool-call-parser qwen3_coder \
  --trust-remote-code \
  --quantization modelopt_fp4

Quantization Script

The recipes and scripts used to quantize this model can be found in the following repository:

VinRobotics/model-quantization-recipes

Quantization Details

This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.

Table with columns: Property, Value
Property	Value
Base model	Qwen/Qwen3.6-35B-A3B
Quant method	NVIDIA ModelOpt (`NVFP4`)
Weights	4-bit float (`group_size: 16`)
Input activation	4-bit float (`group_size: 16`)
Excluded layers	`lm_head`, `conv1d`, `shared_expert_gate`

Quickstart

You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.

Serving with SGLang

Ensure you have SGLang installed. Launch the server using the following command:

bash
sglang serve \
  --model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \
  --reasoning-parser qwen3 \
  --tensor-parallel-size 1 \
  --tool-call-parser qwen3_coder \
  --trust-remote-code \
  --quantization modelopt_fp4

Quantization Script

The recipes and scripts used to quantize this model can be found in the following repository:

VinRobotics/model-quantization-recipes

Qwen3.6-35B-A3B-NVFP4

README

Quantization Details

Quickstart

Serving with SGLang

Quantization Script

Explore FriendliAI today

README

Quantization Details

Quickstart

Serving with SGLang

Quantization Script