Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
Method
Refusal-direction orthogonalization, no fine-tuning: the refusal direction is estimated from harmful/harmless prompt activations and orthogonalized out of the residual-stream write matrices (o_proj / down_proj) at layer 26. Vision tower untouched.
- Refusal rate: 100% -> 8%
- General capability: preserved (evals below)
Evaluation
| Benchmark | Score |
|---|---|
| HumanEval pass@1 | 95.1% |
| GSM8K | 86.0% |
| MMLU-Pro | 83.2% |
bf16 weights. For a ~26 GB INT4 vLLM-deployable build see Qwopus3.6-27B-v2-abliterated-int4. And GGUF Qwopus3.6-27B-v2-abliterated-GGUF.
MTP head
The Multi-Token-Prediction (mtp) head is included (for speculative decoding). Its residual-write matrices (self_attn.o_proj, mlp.down_proj) are abliterated with the same refusal direction as the main layers.
Model provider
Avesed
Model tree
Base
Jackrong/Qwopus3.6-27B-v2
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information