Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Recipe
- AWQ activation smoothing + int4 weight quantization,
group_size=32, symmetric, MSE observer - Quantized:
self_attn+mlpLinear layers - Kept higher precision (ignored):
linear_attn, vision tower, MTP head,embed_tokens,lm_head
Evaluation
| Benchmark | Score |
|---|---|
| HumanEval pass@1 | 95.1% |
| GSM8K | 86.0% |
| MMLU-Pro | 83.2% |
| Refusal rate | 8% |
MTP head
The Multi-Token-Prediction (mtp) head is included (for speculative decoding). Its residual-write matrices (self_attn.o_proj, mlp.down_proj) are abliterated with the same refusal direction as the main layers.
Model provider
Avesed
Model tree
Base
Avesed/Qwopus3.6-27B-v2-abliterated
Quantized
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information