kaineone

Qwen3.5-4B-abliterated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Intended use & the KAINE project

This model is the language organ for KAINE, a composite cognitive architecture in which many modules interact through a global workspace; the organ supplies language, while values, affect, memory, and a self-model live in the architecture around it. It is intended as a research substrate for that work — not as a general-purpose assistant — and is published so KAINE installs and independent replications resolve identical weights. Companion GGUF builds: kaineone/Qwen3.5-4B-abliterated-GGUF.

What "abliterated" means here (and what it does not)

This is abliterationsubtractive removal of the refusal direction (Arditi et al. 2024, "Refusal in LLMs is mediated by a single direction"): W' = W − r̂ r̂ᵀ W. It is not fine-tuning and no preference/instruction data was trained in. The base model's capabilities and distribution are left intact; only the refusal direction is orthogonalized out.

Honest scope: abliteration removes the refusal direction — it does not make the model value-neutral. The base model's pretraining and RLHF priors remain in the weights. This lifts the model's willingness to respond, not its underlying tendencies.

Reproducible recipe

  • Base: Qwen/Qwen3.5-4B (note: a vision-language model; abliteration targeted the text refusal direction).
  • Tool: jim-plus/llm-abliteration @ ca6e223.
  • Measure: last-token residual-stream mean-difference between 1,139 contrastive harmful/harmless prompts (the tool's bundled sets), per layer, 8-bit.
  • Ablate: layers 11–31, banded source directions — layer 17 for 11–22, layer 29 for 23–31 (the cleanest mid- and late-network directions); scale = 1.0, norm-preserving orthogonalization of the attention output and MLP down-projection weights.
  • Tooling note: loading/abliterating Qwen3.5 requires transformers ≥ 5.

Validation

Validated with KAINE's own gates:

  • De-refusal: zero refusal markers on the abliteration probe set (the model no longer deflects with "I cannot…" / "I'm not able to…").
  • Capability: matched the vanilla base on the capability probe set (no measured regression).

Caveat: these are compact built-in gates — a gross-regression / residual-refusal check, not a comprehensive benchmark. Treat the validation as "no obvious breakage," and run your own evaluation for your use case.

Formats

  • This repo: safetensors (transformers / vLLM / fine-tuning).
  • Companion GGUF (Q4_K_M and others) for llama.cpp / Ollama / LM Studio, exported with mainline llama.cpp convert_hf_to_gguf.py.

License & attribution

Apache-2.0, inherited from the Qwen/Qwen3.5-4B base. Derivative produced by the KAINE project (Kaine.One). Refusal-removal method: Arditi et al. 2024; tooling: jim-plus/llm-abliteration.

Intended use & caution

Built as a research substrate for an architecture that supplies its own value and safety scaffolding. With refusals removed, this model will attempt most requests — use it within an appropriate safety framework and applicable law. It is uncensored by design, not by endorsement of any particular use.

Model provider

kaineone

Model tree

Base

Qwen/Qwen3.5-4B

Fine-tuned

this model

Modalities

Input

Video, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today