vrfai

Qwen3.6-35B-A3B-NVFP4

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Quantization Details

This model was quantized using NVIDIA ModelOpt (v0.39.0) with the NVFP4 algorithm. The configuration applies 4-bit float quantization to both weights and activations using a block size of 16.

Table
PropertyValue
Base modelQwen/Qwen3.6-35B-A3B
Quant methodNVIDIA ModelOpt (NVFP4)
Weights4-bit float (group_size: 16)
Input activation4-bit float (group_size: 16)
Excluded layerslm_head, conv1d, shared_expert_gate

Quickstart

You can deploy this model efficiently using SGLang with the modelopt_fp4 quantization backend.

Serving with SGLang

Ensure you have SGLang installed. Launch the server using the following command:

bash

sglang serve \
--model-path vrfai/Qwen3.6-35B-A3B-NVFP4 \
--reasoning-parser qwen3 \
--tensor-parallel-size 1 \
--tool-call-parser qwen3_coder \
--trust-remote-code \
--quantization modelopt_fp4

Quantization Script

The recipes and scripts used to quantize this model can be found in the following repository:

Model provider

vrfai

Model tree

Base

Qwen/Qwen3.6-35B-A3B

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today