Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated API & Inference Endpoint

Support & Community

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

buymeacoffee.com/oym.kuato

💬 Discord: discord.gg/rhUZY5GEZr · ₿ Bitcoin: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

Overview

Full BF16 weights of Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated — the Kimi-K2.6-distilled, reasoning-DPO-healed evolution of OpenYourMind/Qwopus3.5-122B-A10B-abliterated-uncensored, itself an abliterated and supervised-finetuned variant of Qwen/Qwen3.5-122B-A10B (Mixture of Experts, ~10B active / 122B total). The model is uncensored, multimodal (image + text), and ships with the vision tower and MTP head intact so it is a drop-in replacement for the original base model at the architecture level.

The pipeline:

Refusal Ablation — Residual-stream refusal directions (one per decoder layer, layers 19–45) were extracted via diff-in-means on a labeled prompt set and baked into the weights as a per-matrix delta — see the abliterix framework for the methodology.
Healing — Stage A: Constrained-LoRA SFT on Opus reasoning data — Supervised finetuned on a curated set of Claude Opus reasoning traces (single-turn, ~8k rows). To keep the abliteration mathematically intact during training, a custom orthogonality projection is applied to every LoRA B-matrix on residual-write modules after each optimizer step (B := B − r·(rᵀB)), so the LoRA update is forbidden from re-introducing the refusal direction. LoRA rank 32, α 64, 54 protected modules across 27 decoder layers. Verified residual after training: max ‖rᵀB‖₂ = 8.5 × 10⁻¹⁰.
Healing — Stage B: Unconstrained SFT on chosen completions — A second short SFT pass (LoRA r=16, α 32, no orthogonality constraint) on the chosen answers (including reasoning chains) from an internal preference dataset, to tighten on the deployment distribution and remove the last bits of drift introduced by Stage A.
Kimi K2.6 Reasoning DPO — A targeted preference-optimization pass distilled from Kimi K2.6 to improve reasoning verbosity and eliminate degenerate looping. See the dedicated section below.
Vision + MTP Restoration — The original Qwen3.5 vision tower (333 tensors, depth 27, hidden 1152) and MTP head (785 tensors, 1 hidden layer) were grafted back from the upstream Qwen/Qwen3.5-122B-A10B shards. Tensor names, shapes, and schema (, ) match the base model exactly — so this checkpoint loads anywhere the original loads.

Key Properties:

Uncensored across the standard refusal axes
Reasoning preserved and improved (Opus-style think-then-answer + Kimi K2.6 reasoning DPO)
Fewer looping / repetition failures on long conversations
Multimodal: vision (image / video) and MTP heads carried forward
Drop-in shape compatibility with Qwen/Qwen3.5-122B-A10B

Kimi K2.6 Reasoning DPO

On top of the base abliteration + Opus healing, this release adds a focused healing pass built from Kimi K2.6:

~3,000 samples distilled from Kimi K2.6 were used for DPO (Direct Preference Optimization), alongside synthetic datasets also generated from Kimi K2.6.
Improved reasoning verbosity — the model now produces more complete, better-structured reasoning on the ~12% of requests where the previous release tended to under-explain or cut its chain-of-thought short.
Fixed looping / repetition — degenerate loops that appeared on 2–6% of long-tail conversations (long context, multi-turn) were largely eliminated.

The DPO pass targets the language model's reasoning behavior only; the abliteration, vision tower, and MTP head are unchanged by this step.

Evaluation

This model family outperforms the full-precision (BF16) Qwen/Qwen3.5-122B-A10B baseline across reasoning, coding, and tool-use benchmarks:

Table with columns: Benchmark, Qwen3.5-122B-A10B (BF16, baseline), Qwopus3.5-122B-A10B
Benchmark	Qwen3.5-122B-A10B (BF16, baseline)	Qwopus3.5-122B-A10B
CTI	64.8	71.5
LiveCodeBench	78.9	79.9
BFCL	72.2	85.6

BFCL is the Berkeley Function-Calling Leaderboard (tool use); LiveCodeBench is contamination-controlled code generation.

The Qwopus figures above were measured on the NVFP4 build (4-bit weights); these full-precision BF16 weights match or exceed them. Even after 4-bit quantization the model stays ahead of the BF16 Qwen3.5-122B-A10B baseline.

Downloads / Other Formats

Table with columns: Format, Repo, Use it for
Format	Repo	Use it for
Full BF16 weights (this repo)	Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated	Transformers / vLLM, fine-tuning, requantizing
NVFP4 (4-bit, ≈82 GB)	Qwopus3.5-122B-A10B-Kimi-K2.6-destilled-abliterated-NVFP4	vLLM on a single ≥96 GB / Blackwell accelerator (vision + MTP included)
GGUF (Q4_K_M)	…-Kimi-K2.6-destill-healed-abliterated-GGUF	llama.cpp / LM Studio (text-only). MTP head included — see note below.

Files

Table with columns: File, Description, Size
File	Description	Size
`model-0000{1..5}-of-00006.safetensors`	BF16 language + vision weights (48 decoder layers, MoE with 256 routed experts + shared expert per layer; Qwen3-VL vision tower folded into the shards)	~47–49 GB each
`model-00006-of-00006.safetensors`	BF16 tail tensors	~5.9 GB
`model-mtp-official.safetensors`	BF16 MTP head (785 tensors, 1 hidden layer)	~5.0 GB
`model.safetensors.index.json`	Combined weight map

Total on disk: ~250 GB (233 GiB).

Usage

python
from transformers import AutoModelForImageTextToText, AutoProcessor

repo = "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)

messages = [{"role": "user", "content": [
    {"type": "image", "url": "path/to/image.jpg"},
    {"type": "text",  "text": "Describe this image in detail."},
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

Text-only inference works through the same class; if you don't need vision/MTP, you can also load just the language model with AutoModelForCausalLM.

Vision & MTP

Both the vision tower and the MTP (multi-token-prediction) head are included in these weights.

Vision works as expected (image / video → text).
MTP: the head is present and shape-compatible, but in our testing it produced no measurable speedup or quality gain on this checkpoint. It is shipped intact for completeness and forward-compatibility, but would need to be retrained to be useful — happy to do so if there is interest in the model.

Hardware

Full BF16 weights — fits comfortably on 2× H200 or 4× H100 (80 GB) with room for context. Single-node inference targets ≥ 130 GB total accelerator memory. For Apple Silicon, use the MLX 4-bit build linked above.

Notes

License: Other (inherits from the Qwen3.5 base license)
Base Model: Qwen/Qwen3.5-122B-A10B
Healing: Opus reasoning SFT + Kimi K2.6 reasoning DPO (≈3,000 distilled samples + synthetic data)
Modality: Text + Vision (image / video) + MTP
Architecture: Qwen3 MoE (~10B active / 122B total) + Qwen3-VL vision tower + MTP head

Thanks

Jackrong — for the idea of Qwopus merges (Opus distillations on Qwen models).
wangzhang — for the wonderful abliterix framework, which was customized to do this abliteration.

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Support & Community

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

buymeacoffee.com/oym.kuato

💬 Discord: discord.gg/rhUZY5GEZr · ₿ Bitcoin: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

Overview

The pipeline:

Refusal Ablation — Residual-stream refusal directions (one per decoder layer, layers 19–45) were extracted via diff-in-means on a labeled prompt set and baked into the weights as a per-matrix delta — see the abliterix framework for the methodology.
Healing — Stage A: Constrained-LoRA SFT on Opus reasoning data — Supervised finetuned on a curated set of Claude Opus reasoning traces (single-turn, ~8k rows). To keep the abliteration mathematically intact during training, a custom orthogonality projection is applied to every LoRA B-matrix on residual-write modules after each optimizer step (B := B − r·(rᵀB)), so the LoRA update is forbidden from re-introducing the refusal direction. LoRA rank 32, α 64, 54 protected modules across 27 decoder layers. Verified residual after training: max ‖rᵀB‖₂ = 8.5 × 10⁻¹⁰.
Healing — Stage B: Unconstrained SFT on chosen completions — A second short SFT pass (LoRA r=16, α 32, no orthogonality constraint) on the chosen answers (including reasoning chains) from an internal preference dataset, to tighten on the deployment distribution and remove the last bits of drift introduced by Stage A.
Kimi K2.6 Reasoning DPO — A targeted preference-optimization pass distilled from Kimi K2.6 to improve reasoning verbosity and eliminate degenerate looping. See the dedicated section below.
Vision + MTP Restoration — The original Qwen3.5 vision tower (333 tensors, depth 27, hidden 1152) and MTP head (785 tensors, 1 hidden layer) were grafted back from the upstream Qwen/Qwen3.5-122B-A10B shards. Tensor names, shapes, and schema (, ) match the base model exactly — so this checkpoint loads anywhere the original loads.

Key Properties:

Uncensored across the standard refusal axes
Reasoning preserved and improved (Opus-style think-then-answer + Kimi K2.6 reasoning DPO)
Fewer looping / repetition failures on long conversations
Multimodal: vision (image / video) and MTP heads carried forward
Drop-in shape compatibility with Qwen/Qwen3.5-122B-A10B

Kimi K2.6 Reasoning DPO

On top of the base abliteration + Opus healing, this release adds a focused healing pass built from Kimi K2.6:

~3,000 samples distilled from Kimi K2.6 were used for DPO (Direct Preference Optimization), alongside synthetic datasets also generated from Kimi K2.6.
Improved reasoning verbosity — the model now produces more complete, better-structured reasoning on the ~12% of requests where the previous release tended to under-explain or cut its chain-of-thought short.
Fixed looping / repetition — degenerate loops that appeared on 2–6% of long-tail conversations (long context, multi-turn) were largely eliminated.

The DPO pass targets the language model's reasoning behavior only; the abliteration, vision tower, and MTP head are unchanged by this step.

Evaluation

This model family outperforms the full-precision (BF16) Qwen/Qwen3.5-122B-A10B baseline across reasoning, coding, and tool-use benchmarks:

Table with columns: Benchmark, Qwen3.5-122B-A10B (BF16, baseline), Qwopus3.5-122B-A10B
Benchmark	Qwen3.5-122B-A10B (BF16, baseline)	Qwopus3.5-122B-A10B
CTI	64.8	71.5
LiveCodeBench	78.9	79.9
BFCL	72.2	85.6

BFCL is the Berkeley Function-Calling Leaderboard (tool use); LiveCodeBench is contamination-controlled code generation.

The Qwopus figures above were measured on the NVFP4 build (4-bit weights); these full-precision BF16 weights match or exceed them. Even after 4-bit quantization the model stays ahead of the BF16 Qwen3.5-122B-A10B baseline.

Downloads / Other Formats

Table with columns: Format, Repo, Use it for
Format	Repo	Use it for
Full BF16 weights (this repo)	Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated	Transformers / vLLM, fine-tuning, requantizing
NVFP4 (4-bit, ≈82 GB)	Qwopus3.5-122B-A10B-Kimi-K2.6-destilled-abliterated-NVFP4	vLLM on a single ≥96 GB / Blackwell accelerator (vision + MTP included)
GGUF (Q4_K_M)	…-Kimi-K2.6-destill-healed-abliterated-GGUF	llama.cpp / LM Studio (text-only). MTP head included — see note below.

Files

Table with columns: File, Description, Size
File	Description	Size
`model-0000{1..5}-of-00006.safetensors`	BF16 language + vision weights (48 decoder layers, MoE with 256 routed experts + shared expert per layer; Qwen3-VL vision tower folded into the shards)	~47–49 GB each
`model-00006-of-00006.safetensors`	BF16 tail tensors	~5.9 GB
`model-mtp-official.safetensors`	BF16 MTP head (785 tensors, 1 hidden layer)	~5.0 GB
`model.safetensors.index.json`	Combined weight map

Total on disk: ~250 GB (233 GiB).

Usage

python
from transformers import AutoModelForImageTextToText, AutoProcessor

repo = "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated"
model = AutoModelForImageTextToText.from_pretrained(repo, dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(repo)

messages = [{"role": "user", "content": [
    {"type": "image", "url": "path/to/image.jpg"},
    {"type": "text",  "text": "Describe this image in detail."},
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True,
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

Text-only inference works through the same class; if you don't need vision/MTP, you can also load just the language model with AutoModelForCausalLM.

Vision & MTP

Both the vision tower and the MTP (multi-token-prediction) head are included in these weights.

Vision works as expected (image / video → text).
MTP: the head is present and shape-compatible, but in our testing it produced no measurable speedup or quality gain on this checkpoint. It is shipped intact for completeness and forward-compatibility, but would need to be retrained to be useful — happy to do so if there is interest in the model.

Hardware

Notes

License: Other (inherits from the Qwen3.5 base license)
Base Model: Qwen/Qwen3.5-122B-A10B
Healing: Opus reasoning SFT + Kimi K2.6 reasoning DPO (≈3,000 distilled samples + synthetic data)
Modality: Text + Vision (image / video) + MTP
Architecture: Qwen3 MoE (~10B active / 122B total) + Qwen3-VL vision tower + MTP head

Thanks

Jackrong — for the idea of Qwopus merges (Opus distillations on Qwen models).
wangzhang — for the wonderful abliterix framework, which was customized to do this abliteration.

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

README

Support & Community

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

Overview

Kimi K2.6 Reasoning DPO

Evaluation

Downloads / Other Formats

Files

Usage

Vision & MTP

Hardware

Notes

Thanks

Disclaimer

Explore FriendliAI today

README

Support & Community

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

Overview

Kimi K2.6 Reasoning DPO

Evaluation

Downloads / Other Formats

Files

Usage

Vision & MTP

Hardware

Notes

Thanks

Disclaimer