Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Inference

As of 2/27/2026, this model is supported in vLLM nightly. To serve the model:

bash

vllm serve Kbenkhaled/Qwen3.5-35B-A3B-NVFP4 \
--reasoning-parser qwen3 \
--enable-prefix-caching

Evaluation

Evaluated with lm-evaluation-harness, 0-shot, thinking mode ON.

BenchmarkQwen3.5-35B-A3BQwen3.5-35B-A3B-NVFP4 (this model)Recovery
GPQA Diamond81.31%80.81%99.4%
IFEval95.56%92.93%97.2%
MMLU-Redux92.51%92.31%99.8%
Average89.79%88.68%98.8%

Model provider

aastalll

Model tree

Base

Qwen/Qwen3.5-35B-A3B

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today