AlexWortega

AlexWortega

qwen35-4b-soyuz-abliterated-v2

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Usage with sglang

bash

python -m sglang.launch_server \
--model-path AlexWortega/qwen35-4b-soyuz-abliterated-v2 \
--dtype bfloat16 --trust-remote-code \
--tool-call-parser hermes \
--chat-template hermes_qwen.jinja

(hermes parser is needed for the <tool_call>{...}</tool_call> → OpenAI tool_calls conversion — without it agent benches see zero tool calls.)

Abliteration recipe

  1. Build pass-vs-fail contrast: 60 PASS trajectories (reward=1.0) + 60 cleaned FAIL trajectories from soyuz's own evals (claw-eval, tbench-2, MMLU-Pi-agent). Fail trajectories filtered by Gemini-3-flash to keep only CLEAN_FAIL labels (235 of 246 negatives).
  2. Capture last-token residual activations per layer over the rendered contrast (text-only Qwen3_5ForCausalLM).
  3. Compute per-layer direction = mean(refuse) - mean(comply), normalise; pick best layer via AUC.
  4. Orthogonalise model weights (embed rows + every layer's o_proj.weight and down_proj.weight columns) against the direction, optionally blended with strength α: W ← W − α · (W − W_orth).
  5. Wrap text-only weights into the multimodal Qwen3_5ForConditionalGeneration arch so sglang can serve them (vision tower preserved from base; only language_model.* weights are abliterated).

Repos

Table
Varianttbench-17HA20Card
baseline qwen35-4b-soyuz (LoRA)5/174/20link
qwen35-4b-soyuz-abliterated-v2 (single-L, s=0.5)3/178/20link
qwen35-4b-soyuz-abliterated-v3-multi (per-layer, s=0.5)2/176/20link

v2 = highest HA20 (2× baseline). v3 picks up disjoint HA20 tasks (HA-01/02 memory-specific) that v2 misses.

W&B + raw eval logs: https://wandb.ai/alexwortega/vae-llm-agents (training base).

Model provider

AlexWortega

AlexWortega

Model tree

Base

Qwen/Qwen3.5-4B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today