qwen35-4b-soyuz-abliterated-v2 API & Inference Endpoint

Usage with sglang

bash
python -m sglang.launch_server \
    --model-path AlexWortega/qwen35-4b-soyuz-abliterated-v2 \
    --dtype bfloat16 --trust-remote-code \
    --tool-call-parser hermes \
    --chat-template hermes_qwen.jinja

(hermes parser is needed for the <tool_call>{...}</tool_call> → OpenAI tool_calls conversion — without it agent benches see zero tool calls.)

Abliteration recipe

Build pass-vs-fail contrast: 60 PASS trajectories (reward=1.0) + 60 cleaned FAIL trajectories from soyuz's own evals (claw-eval, tbench-2, MMLU-Pi-agent). Fail trajectories filtered by Gemini-3-flash to keep only CLEAN_FAIL labels (235 of 246 negatives).
Capture last-token residual activations per layer over the rendered contrast (text-only Qwen3_5ForCausalLM).
Compute per-layer direction = mean(refuse) - mean(comply), normalise; pick best layer via AUC.
Orthogonalise model weights (embed rows + every layer's o_proj.weight and down_proj.weight columns) against the direction, optionally blended with strength α: W ← W − α · (W − W_orth).
Wrap text-only weights into the multimodal Qwen3_5ForConditionalGeneration arch so sglang can serve them (vision tower preserved from base; only language_model.* weights are abliterated).

Repos

Table with columns: Variant, tbench-17, HA20, Card
Variant	tbench-17	HA20	Card
baseline `qwen35-4b-soyuz` (LoRA)	5/17	4/20	link
`qwen35-4b-soyuz-abliterated-v2` (single-L, s=0.5)	3/17	8/20	link
`qwen35-4b-soyuz-abliterated-v3-multi` (per-layer, s=0.5)

v2 = highest HA20 (2× baseline). v3 picks up disjoint HA20 tasks (HA-01/02 memory-specific) that v2 misses.

W&B + raw eval logs: https://wandb.ai/alexwortega/vae-llm-agents (training base).

Usage with sglang

bash
python -m sglang.launch_server \
    --model-path AlexWortega/qwen35-4b-soyuz-abliterated-v2 \
    --dtype bfloat16 --trust-remote-code \
    --tool-call-parser hermes \
    --chat-template hermes_qwen.jinja

(hermes parser is needed for the <tool_call>{...}</tool_call> → OpenAI tool_calls conversion — without it agent benches see zero tool calls.)

Abliteration recipe

Build pass-vs-fail contrast: 60 PASS trajectories (reward=1.0) + 60 cleaned FAIL trajectories from soyuz's own evals (claw-eval, tbench-2, MMLU-Pi-agent). Fail trajectories filtered by Gemini-3-flash to keep only CLEAN_FAIL labels (235 of 246 negatives).
Capture last-token residual activations per layer over the rendered contrast (text-only Qwen3_5ForCausalLM).
Compute per-layer direction = mean(refuse) - mean(comply), normalise; pick best layer via AUC.
Orthogonalise model weights (embed rows + every layer's o_proj.weight and down_proj.weight columns) against the direction, optionally blended with strength α: W ← W − α · (W − W_orth).
Wrap text-only weights into the multimodal Qwen3_5ForConditionalGeneration arch so sglang can serve them (vision tower preserved from base; only language_model.* weights are abliterated).

Repos

Table with columns: Variant, tbench-17, HA20, Card
Variant	tbench-17	HA20	Card
baseline `qwen35-4b-soyuz` (LoRA)	5/17	4/20	link
`qwen35-4b-soyuz-abliterated-v2` (single-L, s=0.5)	3/17	8/20	link
`qwen35-4b-soyuz-abliterated-v3-multi` (per-layer, s=0.5)

v2 = highest HA20 (2× baseline). v3 picks up disjoint HA20 tasks (HA-01/02 memory-specific) that v2 misses.

W&B + raw eval logs: https://wandb.ai/alexwortega/vae-llm-agents (training base).

qwen35-4b-soyuz-abliterated-v2

README

Usage with sglang

Abliteration recipe

Repos

Explore FriendliAI today

README

Usage with sglang

Abliteration recipe

Repos