Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Highlights
- AWQ Marlin kernel supported (auto-converted by vLLM at runtime)
- MTP speculative decoding supported out of the box
- 110+ tok/s on a single A800 80GB (vLLM 0.21.0, MTP enabled, fp8 KV cache)
vLLM
markdown
vllm serve shawnw3i/Huihui-Qwen3.6-27B-abliterated-AWQ-MTP \--max-model-len 65536 \--reasoning-parser qwen3 \--speculative-config '{"method":"mtp","num_speculative_tokens":3}'
Usage Warnings
-
Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
-
Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
-
Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
-
Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
-
Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
-
No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
Model provider
shawnw3i
Model tree
Base
Qwen/Qwen3.6-27B
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information