Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Headline comparison vs previous packed PARO checkpoint

Canonical tx4/quality3 evaluation was run on the equivalent legacy/original exports against the original BF16 HF model. Packed and legacy exports contain the same quantized tensors, so quality metrics are expected to be identical.

ModelCalibrationOptimizer recipePacked BPW ↓PPL ↓Δ PPL vs prev packedKL nats ↓Δ KL vs prev packedΔNLL ↓Top-1 % ↑
PARO full4096-e5-packed4096×2048previous recipe4.67996.6216baseline0.034684baseline+0.00950692.000
PARO full4096-rbparams-e5-packed4096×2048runbook params4.67996.6116-0.0100 / -0.15%0.028336-0.006348 / -18.3%+0.00799692.816
PARO full8192-oldfresh-rbparams-e5-packed8192×2048runbook paramspendingpendingpendingpendingpendingpendingpending

The new runbook-parameter checkpoint is the best PARO result so far on the canonical held-out validation protocol: lower PPL, lower ΔNLL, lower KL divergence, lower RMS true-token probability drift, and higher top-1 agreement than the previous packed 4096/e5 release.

When the 8192-sample run finishes, this headline section should be updated into a calibration-scaling table covering the first 2048-sample run, this 4096-sample run, and the 8192-sample run. The older 2048 row should be re-evaluated under the same canonical protocol before mixing it into this table.

Full canonical quality table

Evaluation protocol:

  • Reference: original BF16 HF model
  • Validation source: held-out tx4/quality3 calibration validation mix
  • Context/window length: 2048 tokens
  • Stride: 1023 tokens
  • Scored target positions/window: 1025..2047 inclusive
  • Windows: 127
  • Scored tokens/model: 129,921
ModelKindReferenceArtifact BPW ↓Packed BPW est. ↓PPL ↓Ref PPLMean NLL ↓Ref NLLΔNLL ↓KL nats ↓Max KL ↓RMS Δp % ↓Top-1 % ↑
Original BF16 HFHF/Transformersself16.43516.4356.55906.55901.8808361.880836+0.0000000.0000000.0000000.000100.000
PARO full4096-e1HF/ParoQuantOriginal BF16 HF5.3224.6776.65696.55901.8956601.880836+0.0148240.0340556.3790755.09892.036
PARO full4096-e5HF/ParoQuantOriginal BF16 HF5.3224.6776.62166.55901.8903421.880836+0.0095060.03468411.0421965.17092.000
PARO full4096-rbparams-e5HF/ParoQuantOriginal BF16 HF5.3224.6776.61166.55901.8888321.880836+0.0079960.0283369.7887534.73092.816

Packed artifact details

The packed artifact was produced from the legacy/original export with:

bash

python3 scripts/strip_paro_safetensors.py \
--input-dir /models/qwen36-quant/Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5 \
--output-dir /models/qwen36-quant/Qwen3.6-35B-A3B-PARO-full4096-rbparams-e5-packed \
--mode packed \
--overwrite

Packed changes:

  • Removed every duplicate fp16 .weight fallback tensor where the same module has .qweight
  • Removed tensors: 250
  • Removed tensor bytes: 2,810,183,680
  • model.safetensors: 20,474,495,512 bytes
  • Actual packed BPW: 4.6799 using a 35B denominator
  • Verified duplicate shared-expert fallback count after stripping: 0

Related checkpoints:

Training/calibration notes

  • Quantization: W4A16 ParoQuant, bits=4, group_size=128, krot=8
  • Calibration size: 4096 samples × 2048 tokens
  • Validation size: 64 samples × 2048 tokens
  • Batch size: 8
  • Gradient accumulation: 2
  • Skipped modules: mlp.gate, mlp.shared_expert_gate, linear_attn.in_proj_a, linear_attn.in_proj_b

Notes

This artifact requires a packed-aware ParoQuant-compatible loader/runtime; legacy loaders that expect duplicate fp16 fallback .weight tensors will not load this format.

See strip_paro_safetensors_report.json for the exact stripping report.

Model provider

shisa-ai

shisa-ai

Model tree

Base

Qwen/Qwen3.6-35B-A3B

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today