Avesed

Qwopus3.6-27B-v2-abliterated-int4

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Recipe

AWQ activation smoothing + int4 weight quantization, group_size=32, symmetric, MSE observer
Quantized: self_attn + mlp Linear layers
Kept higher precision (ignored): linear_attn, vision tower, MTP head, embed_tokens, lm_head

Evaluation

Benchmark	Score
HumanEval pass@1	95.1%
GSM8K	86.0%
MMLU-Pro	83.2%
Refusal rate	8%

MTP head

The Multi-Token-Prediction (mtp) head is included (for speculative decoding). Its residual-write matrices (self_attn.o_proj, mlp.down_proj) are abliterated with the same refusal direction as the main layers.

Model provider

Avesed

Model tree

Base

Avesed/Qwopus3.6-27B-v2-abliterated

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text