Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- AWQ Marlin kernel supported (auto-converted by vLLM at runtime)
- MTP speculative decoding supported out of the box
- 110+ tok/s on a single A800 80GB (vLLM 0.21.0, MTP enabled, fp8 KV cache)
vLLM
markdown
vllm serve shawnw3i/Qwen3.6-27B-AWQ-MTP \--max-model-len 65536 \--reasoning-parser qwen3 \--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
Model provider
shawnw3i
Model tree
Base
Qwen/Qwen3.6-27B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information