Qwen3.6-35B-A3B-REAM-160-ru-agent API & Inference Endpoint

Build

Base: Qwen/Qwen3.6-35B-A3B
REAM commit: 84a3030716a0059589e9d10e2ea049e32b76cfa6
Transformers: 5.12.1
Target main MoE experts: 160
Calibration mix: code+math+agent+ru
Mix ratio: 0.20,0.10,0.35,0.35
Batch/sequence/seed: 3072, 512, 42
REAM merge: grouping=ream, saliency=reap, merging=logits+weights, group_size=32

MTP

The main language MoE layers were REAM-merged to 160 experts. Qwen3.6 MTP tensors are packed differently from the upstream REAM MTP helper assumptions, so packed MTP tensors are reduced with a saved REAM group map rather than upstream --mtp_safe_tensors.

Structural audit:

Main language MoE layers: 160 experts
MTP MoE: 160 experts
Final model.safetensors.index.json includes mtp.* keys
vLLM qwen3_next_mtp drafter load must pass before the checkpoint is accepted

Calibration Focus

The recipe favors:

agentic/tool-use behavior via zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
Cyrillic/Russian chat retention via ZeroAgency/ru-big-russian-dataset
code and math anchors from the original REAM calibration paths

Agent calibration had 2477 unique accepted samples and was deterministically repeated to fill the 3072 batch. RU calibration had 3072 unique accepted samples.

Verification

Artifacts in this repo include:

stats/structural_audit.json
stats/smoke_outputs.json
stats/agent_calibration_stats.json
stats/ru_calibration_stats.json
logs/main_merge.log
manifests/hf_final_audit.json

Smoke prompts covered Russian Cyrillic chat, coding-agent debugging, and XML-style tool calling.

Build

Base: Qwen/Qwen3.6-35B-A3B
REAM commit: 84a3030716a0059589e9d10e2ea049e32b76cfa6
Transformers: 5.12.1
Target main MoE experts: 160
Calibration mix: code+math+agent+ru
Mix ratio: 0.20,0.10,0.35,0.35
Batch/sequence/seed: 3072, 512, 42
REAM merge: grouping=ream, saliency=reap, merging=logits+weights, group_size=32

MTP

Structural audit:

Main language MoE layers: 160 experts
MTP MoE: 160 experts
Final model.safetensors.index.json includes mtp.* keys
vLLM qwen3_next_mtp drafter load must pass before the checkpoint is accepted

Calibration Focus

The recipe favors:

agentic/tool-use behavior via zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
Cyrillic/Russian chat retention via ZeroAgency/ru-big-russian-dataset
code and math anchors from the original REAM calibration paths

Agent calibration had 2477 unique accepted samples and was deterministically repeated to fill the 3072 batch. RU calibration had 3072 unique accepted samples.

Verification

Artifacts in this repo include:

stats/structural_audit.json
stats/smoke_outputs.json
stats/agent_calibration_stats.json
stats/ru_calibration_stats.json
logs/main_merge.log
manifests/hf_final_audit.json

Smoke prompts covered Russian Cyrillic chat, coding-agent debugging, and XML-style tool calling.

Qwen3.6-35B-A3B-REAM-160-ru-agent

README

Build

MTP

Calibration Focus

Verification

Explore FriendliAI today

README

Build

MTP

Calibration Focus

Verification