vrfai
Qwen3.6-35B-A3B-NVFP4
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Quantization Details
This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.6-35B-A3B |
| Quant method | NVIDIA ModelOpt (NVFP4) |
| Weights | 4-bit float (group_size: 16) |
| Input activation | 4-bit float (group_size: 16) |
| Excluded layers | lm_head, conv1d, shared_expert_gate |
Quickstart
You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.
Serving with SGLang
Ensure you have SGLang installed. Launch the server using the following command:
bash
sglang serve \--model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \--reasoning-parser qwen3 \--tensor-parallel-size 1 \--tool-call-parser qwen3_coder \--trust-remote-code \--quantization modelopt_fp4
Quantization Script
The recipes and scripts used to quantize this model can be found in the following repository:
Model provider
vrfai
Model tree
Base
Qwen/Qwen3.6-35B-A3B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information