Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

Recipe

  • AWQ activation smoothing + int4 weight quantization, group_size=32, symmetric, MSE observer
  • Quantized: self_attn + mlp Linear layers
  • Kept higher precision (ignored): linear_attn, vision tower, MTP head, embed_tokens, lm_head

Evaluation

BenchmarkScore
HumanEval pass@195.1%
GSM8K86.0%
MMLU-Pro83.2%
Refusal rate8%

MTP head

The Multi-Token-Prediction (mtp) head is included (for speculative decoding). Its residual-write matrices (self_attn.o_proj, mlp.down_proj) are abliterated with the same refusal direction as the main layers.

Model provider

Avesed

Model tree

Base

Avesed/Qwopus3.6-27B-v2-abliterated

Quantized

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today