QwenPaw-Flash-9B-heretic API & Inference Endpoint

MTP acceptance rate: ~43%
Speedup: ~1.5-1.9x decode throughput

For MTP-enabled GGUF inference, see the MTP GGUF repo below.

Standard GGUF (no MTP): SC117/QwenPaw-Flash-9B-heretic-GGUF
MTP GGUF (with Multi-Token Prediction head): SC117/QwenPaw-Flash-9B-heretic-MTP-GGUF

model = AutoModelForCausalLM.from_pretrained( "SC117/QwenPaw-Flash-9B-heretic", torch_dtype=torch.float32, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("SC117/QwenPaw-Flash-9B-heretic")

markdown
</div>
</div>
<div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02); margin-bottom: 20px;">
<div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📄</span> License</div>
<div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
Same as base model (Qwen3.5-9B).
</div>
</div>

MTP acceptance rate: ~43%
Speedup: ~1.5-1.9x decode throughput

For MTP-enabled GGUF inference, see the MTP GGUF repo below.

Standard GGUF (no MTP): SC117/QwenPaw-Flash-9B-heretic-GGUF
MTP GGUF (with Multi-Token Prediction head): SC117/QwenPaw-Flash-9B-heretic-MTP-GGUF

markdown
</div>
</div>
<div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02); margin-bottom: 20px;">
<div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📄</span> License</div>
<div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
Same as base model (Qwen3.5-9B).
</div>
</div>

QwenPaw-Flash-9B-heretic

README

Explore FriendliAI today

README