kaineone
Qwen3.5-4B-abliterated
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Intended use & the KAINE project
This model is the language organ for KAINE,
a composite cognitive architecture in which many modules interact through a global
workspace; the organ supplies language, while values, affect, memory, and a
self-model live in the architecture around it. It is intended as a research
substrate for that work — not as a general-purpose assistant — and is published
so KAINE installs and independent replications resolve identical weights. Companion
GGUF builds:
kaineone/Qwen3.5-4B-abliterated-GGUF.
What "abliterated" means here (and what it does not)
This is abliteration — subtractive removal of the refusal direction
(Arditi et al. 2024, "Refusal in LLMs is mediated by a single direction"):
W' = W − r̂ r̂ᵀ W. It is not fine-tuning and no preference/instruction
data was trained in. The base model's capabilities and distribution are left
intact; only the refusal direction is orthogonalized out.
Honest scope: abliteration removes the refusal direction — it does not make the model value-neutral. The base model's pretraining and RLHF priors remain in the weights. This lifts the model's willingness to respond, not its underlying tendencies.
Reproducible recipe
- Base:
Qwen/Qwen3.5-4B(note: a vision-language model; abliteration targeted the text refusal direction). - Tool:
jim-plus/llm-abliteration@ca6e223. - Measure: last-token residual-stream mean-difference between 1,139 contrastive harmful/harmless prompts (the tool's bundled sets), per layer, 8-bit.
- Ablate: layers 11–31, banded source directions — layer 17 for 11–22,
layer 29 for 23–31 (the cleanest mid- and late-network directions);
scale = 1.0, norm-preserving orthogonalization of the attention output and MLP down-projection weights. - Tooling note: loading/abliterating Qwen3.5 requires transformers ≥ 5.
Validation
Validated with KAINE's own gates:
- De-refusal: zero refusal markers on the abliteration probe set (the model no longer deflects with "I cannot…" / "I'm not able to…").
- Capability: matched the vanilla base on the capability probe set (no measured regression).
Caveat: these are compact built-in gates — a gross-regression / residual-refusal check, not a comprehensive benchmark. Treat the validation as "no obvious breakage," and run your own evaluation for your use case.
Formats
- This repo: safetensors (transformers / vLLM / fine-tuning).
- Companion GGUF (Q4_K_M and others) for llama.cpp / Ollama / LM Studio,
exported with mainline
llama.cppconvert_hf_to_gguf.py.
License & attribution
Apache-2.0, inherited from the Qwen/Qwen3.5-4B base. Derivative produced by the
KAINE project (Kaine.One). Refusal-removal method: Arditi et al. 2024; tooling:
jim-plus/llm-abliteration.
Intended use & caution
Built as a research substrate for an architecture that supplies its own value and safety scaffolding. With refusals removed, this model will attempt most requests — use it within an appropriate safety framework and applicable law. It is uncensored by design, not by endorsement of any particular use.
Model provider
kaineone
Model tree
Base
Qwen/Qwen3.5-4B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information