Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Method

Refusal-direction orthogonalization, no fine-tuning: the refusal direction is estimated from harmful/harmless prompt activations and orthogonalized out of the residual-stream write matrices (o_proj / down_proj) at layer 26. Vision tower untouched.

  • Refusal rate: 100% -> 8%
  • General capability: preserved (evals below)

Evaluation

BenchmarkScore
HumanEval pass@195.1%
GSM8K86.0%
MMLU-Pro83.2%

bf16 weights. For a ~26 GB INT4 vLLM-deployable build see Qwopus3.6-27B-v2-abliterated-int4. And GGUF Qwopus3.6-27B-v2-abliterated-GGUF.

MTP head

The Multi-Token-Prediction (mtp) head is included (for speculative decoding). Its residual-write matrices (self_attn.o_proj, mlp.down_proj) are abliterated with the same refusal direction as the main layers.

Model provider

Avesed

Model tree

Base

Jackrong/Qwopus3.6-27B-v2

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today