Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0
  • MTP acceptance rate: ~43%
  • Speedup: ~1.5-1.9x decode throughput

For MTP-enabled GGUF inference, see the MTP GGUF repo below.

model = AutoModelForCausalLM.from_pretrained( "SC117/QwenPaw-Flash-9B-heretic", torch_dtype=torch.float32, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("SC117/QwenPaw-Flash-9B-heretic")

markdown

</div>
</div>
<div style="border: 1px solid #cbd5e1; border-radius: 12px; overflow: hidden; background: #ffffff; box-shadow: 0 2px 4px rgba(0,0,0,0.02); margin-bottom: 20px;">
<div style="background: linear-gradient(135deg, #ff6b35 0%, #f7931e 100%); padding: 12px 16px; color: white; font-weight: 700; font-size: 14px; display: flex; align-items: center; gap: 8px;"><span>📄</span> License</div>
<div style="padding: 16px; font-size: 13px; color: #334155; line-height: 1.7;">
Same as base model (Qwen3.5-9B).
</div>
</div>

Model provider

SC117

Model tree

Base

Qwen/Qwen3.5-9B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today