shisa-ai

Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5-packed

README

License: apache-2.0

Headline comparison vs previous packed PARO checkpoint

Canonical tx4/quality3 evaluation was run on the equivalent legacy/original exports against the original BF16 HF model. Packed and legacy exports contain the same quantized tensors, so quality metrics are expected to be identical.

Table with columns: Model, Calibration, Optimizer recipe, Packed BPW ↓, PPL ↓, Δ PPL vs prev packed, KL nats ↓, Δ KL vs prev packed, ΔNLL ↓, Top-1 % ↑
Model	Calibration	Optimizer recipe	Packed BPW ↓	PPL ↓	Δ PPL vs prev packed	KL nats ↓	Δ KL vs prev packed	ΔNLL ↓	Top-1 % ↑
PARO full4096-e5-packed	4096×2048	previous recipe	4.6799	6.6216	baseline	0.034684	baseline	+0.009506	92.000
PARO full4096-rbparams-e5-packed	4096×2048	runbook params	4.6799	6.6116	-0.0100 / -0.15%	0.028336	-0.006348 / -18.3%	+0.007996	92.816
PARO full8192-oldfresh-rbparams-e5-packed	8192×2048	runbook params	pending	pending	pending	pending	pending	pending	pending

The new runbook-parameter checkpoint is the best PARO result so far on the canonical held-out validation protocol: lower PPL, lower ΔNLL, lower KL divergence, lower RMS true-token probability drift, and higher top-1 agreement than the previous packed 4096/e5 release.

When the 8192-sample run finishes, this headline section should be updated into a calibration-scaling table covering the first 2048-sample run, this 4096-sample run, and the 8192-sample run. The older 2048 row should be re-evaluated under the same canonical protocol before mixing it into this table.

Full canonical quality table

Evaluation protocol:

Reference: original BF16 HF model
Validation source: held-out tx4/quality3 calibration validation mix
Context/window length: 2048 tokens
Stride: 1023 tokens
Scored target positions/window: 1025..2047 inclusive
Windows: 127
Scored tokens/model: 129,921

Table with columns: Model, Kind, Reference, Artifact BPW ↓, Packed BPW est. ↓, PPL ↓, Ref PPL, Mean NLL ↓, Ref NLL, ΔNLL ↓, KL nats ↓, Max KL ↓, RMS Δp % ↓, Top-1 % ↑
Model	Kind	Reference	Artifact BPW ↓	Packed BPW est. ↓	PPL ↓	Ref PPL	Mean NLL ↓	Ref NLL	ΔNLL ↓	KL nats ↓	Max KL ↓	RMS Δp % ↓	Top-1 % ↑
Original BF16 HF	HF/Transformers	self	16.435	16.435

Packed artifact details

The packed artifact was produced from the legacy/original export with:

bash
python3 scripts/strip_paro_safetensors.py \
  --input-dir /models/qwen36-quant/Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5 \
  --output-dir /models/qwen36-quant/Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5-packed \
  --mode packed \
  --overwrite

Packed changes:

Removed every duplicate fp16 .weight fallback tensor where the same module has .qweight
Removed tensors: 250
Removed tensor bytes: 2,810,183,680
model.safetensors: 20,474,495,512 bytes
Actual packed BPW: 4.6799 using a 35B denominator
Verified duplicate shared-expert fallback count after stripping: 0

Related checkpoints:

Previous packed 4096/e5 release: shisa-ai/Qwen3.6-35B-A3B-PARO-full4096-e5-packed
Legacy/original-format rbparams export: Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5

Training/calibration notes

Quantization: W4A16 ParoQuant, bits=4, group_size=128, krot=8
Calibration size: 4096 samples × 2048 tokens
Validation size: 64 samples × 2048 tokens
Batch size: 8
Gradient accumulation: 2
Skipped modules: mlp.gate, mlp.shared_expert_gate, linear_attn.in_proj_a,

Notes

This artifact requires a packed-aware ParoQuant-compatible loader/runtime; legacy loaders that expect duplicate fp16 fallback .weight tensors will not load this format.

See strip_paro_safetensors_report.json for the exact stripping report.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Model Details

Model Provider

shisa-ai

Model Tree

Base

Qwen/Qwen3.6-35B-A3B

Quantized

this model

Input Modalities

Text

Image

Video

Output Modalities