Qwen3.5-35B-EXL3-4bpw API & Inference Endpoint

At a glance

Table

Base model	Qwen/Qwen3.5-35B-A3B-Base
Format	EXL3-4bpw
Total params	35B
Active / token	3B
Experts / layer	—
Layers	—
Hidden size	—
Context	—
On-disk size	21 GB

Which variant should I pick?

Table with columns: Variant, Format, Link
Variant	Format	Link
`Qwen3.5-264B`	BF16	link
`Qwen3.5-264B-FP8`	FP8	link
`Qwen3.5-264B-W4A16`	W4A16	link

The full base-model documentation lives upstream; this card covers only the EXL3-4bpw build.

See the base model for architecture, benchmarks, and general usage.

License & citation

License inherited from the base model.

bibtex
@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

At a glance

Table

Base model	Qwen/Qwen3.5-35B-A3B-Base
Format	EXL3-4bpw
Total params	35B
Active / token	3B
Experts / layer	—
Layers	—
Hidden size	—
Context	—
On-disk size	21 GB

Which variant should I pick?

Table with columns: Variant, Format, Link
Variant	Format	Link
`Qwen3.5-264B`	BF16	link
`Qwen3.5-264B-FP8`	FP8	link
`Qwen3.5-264B-W4A16`	W4A16	link

The full base-model documentation lives upstream; this card covers only the EXL3-4bpw build.

See the base model for architecture, benchmarks, and general usage.

License & citation

License inherited from the base model.

bibtex
@misc{lasby2025reap,
  title  = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
  author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
  year   = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

Qwen3.5-35B-EXL3-4bpw

README

At a glance

Which variant should I pick?

License & citation

Sponsors

Explore FriendliAI today

README

At a glance

Which variant should I pick?

License & citation

Sponsors