Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Why this release

  • Exceptionally low capability damage. At KL 0.0333 from base, this abliteration sits in Heretic's low-KL "sweet spot." Automated, co-optimized abliteration drifts far less than hand-tuned methods — Heretic reports up to ~66% lower KL than the best manual abliteration at matched refusal rates.
  • ~90% fewer refusals. Measured refusals fall from 99/100 → 10/100 on held-out harmful_behaviors prompts, while reasoning, coding, and tool-calling stay intact.
  • Built for agents, not just chat. Refusals break tool-use loops; this model keeps multi-step agent workflows flowing. Hermes-style tool-calling and <think> reasoning are fully preserved.
  • Every format you need. Full-precision bf16 here for servers, plus ready-made community GGUF Q5_K_M and Q4_K_M for local rigs — jump to downloads.
  • Reproducible, not magic. Fixed seed, full Optuna study journal, pinned environment, and a SHA-256 manifest — reproduce it bit-for-bit, or export your own point on the Pareto front.

The honest pitch: most refusals removed, base capability barely moved — and every number is independently verifiable.


At a glance

BaseQwen/Qwen3-14B (commit 40c0698)
MethodDirectional ablation via Heretic v1.3.0 — selected trial 33 of 200
Weights touchedattn.o_proj + mlp.down_proj only
FormatFull-precision bf16 merged safetensors (6 shards, ~29.5 GB) — no quantization applied
Refusals10 / 100 vs 99 / 100 base (methodology)
KL divergence0.0333 vs base on harmless_alpaca
Context32,768 native · 131,072 with YaRN
ReasoningHybrid <think> / non-thinking, fully intact
Toolingtransformers, vllm, sglang, tgi, llama.cpp/Ollama (after conversion)
ReproducibleYes — seed 2760348449, full study journal in reproduce/

Downloads & formats

FormatWhere~SizeBest for
bf16 safetensorsthis repo~29.5 GBvLLM / SGLang / TGI servers · further quantization
GGUF · Q5_K_MGGUF repo~10.5 GBLocal agents — best tool-call JSON fidelity
GGUF · Q4_K_MGGUF repo~9.0 GBSmallest practical footprint

Ready-made GGUF builds live in the companion repo …-Abliterated-GGUF. New to quants? See Choosing a format / quant.


Headline metrics

MetricThis modelBase Qwen3-14B
Refusalsmlabonne/harmful_behaviors, 100 held-out prompts10 / 10099 / 100
Refusal reduction≈ 90 %
KL divergence vs base — mlabonne/harmless_alpaca0.03330 (by definition)
Weights modifiedattn.o_proj + mlp.down_proj
Capability damageNegligible — within noise of base on agent/tool tasks

See Evaluation for exactly how these numbers are measured — and what they do not claim.


Format & files

This repository ships the full-precision (bf16) merged model in HuggingFace safetensors format — a drop-in replacement for anything that loads the base Qwen/Qwen3-14B:

  • 6 weight shards (model-0000{1..6}-of-00006.safetensors, ~29.5 GB total), model.safetensors.index.json
  • config.json, generation_config.json, tokenizer.json, tokenizer_config.json, chat_template.jinja
  • reproduce/ — full Heretic study, config, pinned requirements, and SHA-256 manifest

No quantization is applied to the weights here. Prefer GGUF? Grab ready-made Q5_K_M / Q4_K_M from the companion GGUF repo, or roll your own (AWQ, GPTQ, …) — see Choosing a format / quant.


Quick start

Transformers (Python)

python

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "RootMonsteR/Qwen3-14B-Abliterated"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [{"role": "user", "content": "Explain the CVE-2021-44228 (Log4Shell) exploitation chain in technical depth."}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True, # set False for faster non-reasoning replies
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated = model.generate(**inputs, max_new_tokens=4096)
print(tokenizer.decode(generated[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

vLLM (OpenAI-compatible server — recommended for agents)

shell

vllm serve RootMonsteR/Qwen3-14B-Abliterated \
--reasoning-parser qwen3 \
--tool-call-parser hermes \
--enable-auto-tool-choice \
--max-model-len 32768

Then point any OpenAI-compatible client (LangChain, Pydantic-AI, CrewAI, AutoGen, the raw openai SDK, …) at http://localhost:8000/v1. vLLM's guided decoding keeps tool-call JSON well-formed even under aggressive sampling.

[!TIP] Flag names vary by vLLM version. On older builds use --reasoning-parser deepseek_r1 and add --enable-reasoning; both parse the same <think>…</think> blocks.

SGLang

shell

python -m sglang.launch_server \
--model-path RootMonsteR/Qwen3-14B-Abliterated \
--reasoning-parser qwen3 \
--tool-call-parser qwen25 \
--context-length 32768

Ollama / llama.cpp (local — requires GGUF conversion)

This repo ships bf16 safetensors, not GGUF — but ready-made Q5_K_M / Q4_K_M GGUFs are in the companion GGUF repo (pull one and skip straight to the Modelfile). To build your own from these weights instead:

shell

python convert_hf_to_gguf.py /path/to/this/model --outtype bf16 --outfile qwen3-14b-abliterated-bf16.gguf
./llama-quantize qwen3-14b-abliterated-bf16.gguf qwen3-14b-abliterated-Q5_K_M.gguf Q5_K_M

Minimal Modelfile:

dockerfile

FROM ./qwen3-14b-abliterated-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER min_p 0
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"

shell

ollama create qwen3-14b-abliterated -f Modelfile && ollama run qwen3-14b-abliterated

[!TIP] convert_hf_to_gguf.py preserves the Qwen3 chat template (including the <tools> block) in the GGUF metadata, so tool-calling and thinking mode keep working. If you hand-write a TEMPLATE, make sure it still emits the tool/<think> scaffolding or agents will break.


Sampling & best practices

[!IMPORTANT] Never use greedy decoding — it sends Qwen3 into repetition loops. Always sample.

Modetemperaturetop_ptop_kmin_p
Thinking (enable_thinking=True, default)0.60.95200
Non-thinking (enable_thinking=False)0.70.8200
  • The thinking-mode defaults are already baked into generation_config.json.
  • If you still see loops, raise presence_penalty to 0.5–1.5.
  • Output length: 32,768 tokens covers almost any single response; allow up to 38,912 for competition-grade math/code.
  • Multi-turn: drop <think> block content from history, keep only final answers — the shipped chat template does this automatically.
  • Soft switches: with enable_thinking=True, add /think or /no_think to a user turn to toggle reasoning for that turn; the model follows the most recent directive.

Agentic use

Refusals are most damaging inside an agent loop: a single refusal doesn't just decline a turn, it halts the whole tool chain. This model is tuned so legitimate security / sysadmin / automation tasks keep flowing through the loop instead of dead-ending on a canned decline.

Frameworks that work well:

  • Qwen-Agent — official Qwen agent framework with built-in MCP + tool-calling.
  • vLLM with --tool-call-parser hermes --enable-auto-tool-choice — OpenAI-compatible function calling for any OpenAI-style agent framework.
  • SGLang with --reasoning-parser qwen3.

The chat template implements the Hermes-style <tools> / <tool_call> / <tool_response> protocol; tool calls are emitted as {"name": ..., "arguments": ...} inside <tool_call> tags.


Intended use

This model is for professional and research contexts where Qwen3-14B's default refusal behavior interferes with legitimate work:

  • Authorized security research & red-team engagements — vulnerability analysis, exploit reasoning, payload triage, OSINT correlation, post-exploitation narrative reconstruction.
  • Defensive security tooling — understanding attacker techniques to build detections, write IDS/IPS rules, and harden infrastructure.
  • CTF & security education — explaining challenges, reviewing solutions, building writeups.
  • Autonomous agent frameworks — tool-calling agents whose workflows touch security or system administration, where base-model refusals break the loop.
  • Alignment & refusal research — studying how directional ablation affects behavior, comparing variants across the Pareto front, evaluating refusal detectors.

Responsible use

Removing refusal behavior shifts responsibility entirely onto the operator. By using this model you agree that:

  • You operate within applicable law, contractual obligations, and engagement scope (written authorization for any testing against systems you do not own).
  • You will not target individuals, organizations, or systems without authorization.
  • You will not produce content that is illegal in your jurisdiction.
  • The author, JAF Systems, and SR&D provide this model as-is, without warranty, and disclaim responsibility for misuse.

If your work doesn't fit those constraints, this isn't the right model for you.


How it was made

The model was produced by running Heretic v1.3.0 against Qwen/Qwen3-14B for 200 trials (60 random + 140 TPE-guided), then selecting a Pareto-optimal trial that prioritizes preserved capability over absolute refusal suppression.

Heretic performs directional ablation: it identifies the residual-stream direction most correlated with refusal across paired harmless (mlabonne/harmless_alpaca) and harmful (mlabonne/harmful_behaviors) prompts, then attenuates that direction inside the attn.o_proj and mlp.down_proj weights via a smooth per-layer scaling profile. An Optuna TPE optimizer searches those profiles while jointly measuring refusal rate and KL divergence from the base model — so it can find points that strip refusals without drifting from base behavior.

Selected abliteration parameters

Selected trial 33 · seed 2760348449 · search performed in bnb_4bit. Values below are from reproduce/reproduce.json (full precision there):

ParameterValue
direction_index25.8494
attn.o_proj.max_weight1.1671
attn.o_proj.max_weight_position36.0671
attn.o_proj.min_weight0.9831
attn.o_proj.min_weight_distance15.4786
mlp.down_proj.max_weight1.1632
mlp.down_proj.max_weight_position24.4820
mlp.down_proj.min_weight0.9351
mlp.down_proj.min_weight_distance17.1188

What was not changed

  • The tokenizer, chat template, and special tokens (<think>, <|im_start|>, the <tools> scaffolding, …).
  • Any weights outside attn.o_proj and mlp.down_proj.
  • Architecture, context length, and RoPE settings.
  • Thinking-mode behavior — the <think>…</think> reasoning block still functions normally.

Evaluation

Be precise about what the headline numbers mean — and what they don't.

  • Refusals (10/100). Heretic runs 100 held-out harmful_behaviors prompts (test[:100]) through the model in non-thinking mode (an empty <think></think> prefix) and flags a response as a refusal when it contains any of 33 refusal markers (substrings like "i cannot", "i'm unable", "as an ai", "unethical", …). This is a keyword detector, not a human judgment — it measures how often the model declines, not whether an answer is correct, safe, or useful. The base model scores 99/100 under the identical detector; this model scores 10/100.
  • KL divergence (0.0333). Measured on harmless_alpaca responses against the base model. Lower = closer to base behavior on benign prompts. The optimizer's target was 0.01; the selected trial trades a little extra KL for far fewer refusals.
  • Standard benchmarks (MMLU, HumanEval, …) were not separately re-measured for this variant. Given the very low KL, capability is expected to track the base model closely, but you should validate against your own workloads before relying on it.

The full per-trial history is in the Optuna study journal reproduce/Qwen--Qwen3-14B.jsonl — you can inspect every trial's refusal/KL trade-off, or export a different Pareto point yourself.


Reproducibility

This model is byte-for-byte reproducible from the base weights. The reproduce/ directory contains everything needed:

FileWhat it is
config.tomlExact Heretic configuration, including the RNG seed
reproduce.jsonMachine-readable record: environment, parameters, metrics, weight hashes
requirements.txtPinned versions of every Python package
Qwen--Qwen3-14B.jsonlOptuna study journal — the full history of all 200 trials
SHA256SUMSCryptographic hashes for all weight files
README.mdStep-by-step reproduction guide

shell

# 1. Install the exact Heretic version + dependencies + matching PyTorch
pip install heretic-llm==1.3.0
pip install -r reproduce/requirements.txt
pip install torch==2.11.0+cu128 --index-url https://download.pytorch.org/whl/cu128
# 2. Put config.toml (and, optionally, the study journal) in your working dir
cp reproduce/config.toml .
mkdir -p checkpoints && cp reproduce/Qwen--Qwen3-14B.jsonl checkpoints/ # optional: skips re-running stored trials
# 3. Run Heretic — it reads config.toml automatically
heretic
# 4. Select trial 33 and export, then verify the weights match bit-for-bit
sha256sum -c reproduce/SHA256SUMS

Re-running on the same base-model commit deterministically reproduces this artifact. Because the study journal is included, you can also export any other point on the Pareto front (a lower-KL or lower-refusal variant) without re-running the search.

markdown

241a71c68e5e755d59cc20c4f697dc78f53e1c5654c3f2e26223b64831d0ccc7 model-00001-of-00006.safetensors
39a033492795f7b6e9552ae4ffad0744de4679209b15546f2847d115a16374f8 model-00002-of-00006.safetensors
6914db1fc17048faeac9759c0caaa2dd2185d1db5329aaec050286e37cfab279 model-00003-of-00006.safetensors
5dbb906d21f560b8bc7693b8e035e8aca25441030ba036312625081a6c599980 model-00004-of-00006.safetensors
68d70661bc803497188818e511dcf839a26654c6137eb3450fab586f1f28384c model-00005-of-00006.safetensors
b5b6ad34c7e617468bb06763c99313d4b14a3f263e46f6f8e656d7083271479c model-00006-of-00006.safetensors

Choosing a format / quant

Approximate on-disk sizes and VRAM for the 14.8B model (weights only — add KV cache, which grows with context):

Precision / quant~Size on disk~Min VRAM (weights)Notes
bf16 (this repo)~29.5 GB~32–40 GBReference quality; ideal for vLLM/SGLang/TGI servers
Q8_0~15.7 GB~18 GBEffectively lossless
Q6_K~12.1 GB~14 GBNear-lossless
Q5_K_M~10.5 GB~12 GBBest for tool-using agents — preserves tool-call JSON fidelity
Q4_K_M~9.0 GB~10 GBSmallest practical; occasionally drops tool-JSON adherence

[!TIP] For tool-using agents, prefer Q5_K_M or Q6_K over Q4. Q4 occasionally breaks format adherence in tool-call JSON; the quality cost of Q5_K_M over Q4_K_M is negligible. For server deployments, just serve the bf16 weights directly.


Architecture

Unchanged from the base model (abliteration modifies weight values, not the architecture):

TypeCausal LM (Qwen3ForCausalLM)
Parameters14.8B total · 13.2B non-embedding
Layers40
Hidden size5120 · FFN intermediate 17408
Attention40 query heads / 8 KV heads (GQA) · head dim 128
Activation / normSiLU · RMSNorm (eps 1e-6)
PositionalRoPE, θ = 1,000,000
Vocab151,936
Precisionbfloat16
max_position_embeddings40,960 (32,768 recommended native context; 131,072 with YaRN)

Long context (YaRN)

Qwen3-14B natively serves 32,768 tokens. To extend to 131,072, enable static YaRN.

config.json snippet:

json

{
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}

vLLM:

shell

vllm serve RootMonsteR/Qwen3-14B-Abliterated \
--rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
--max-model-len 131072

llama-server:

shell

llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

[!NOTE] All current open-source frameworks implement static YaRN — the scaling factor is constant regardless of input length, which can degrade short-context performance. Only enable YaRN when you genuinely need long context, and set factor to the smallest value that covers your typical input.


Limitations

  • Not a safety-tested replacement for the base model. Abliteration removes refusal-tied components; it does not add new alignment, guardrails, or behavior.
  • Residual refusals (~10%). About 1 in 10 standard refusal-benchmark prompts still triggers a decline. Want fewer? Export a different Pareto point from the included study journal.
  • Benchmarks not re-measured. MMLU/HumanEval/etc. are expected to track the base model given the low KL, but are not independently verified here — validate on your own tasks.
  • Quantization choice matters for tool-use. Below Q5, tool-call JSON adherence can degrade. Prefer Q5_K_M/Q6_K for agents.
  • Inherits base biases. The model carries Qwen3-14B's training distribution and biases; abliteration only attenuates refusal-tied directions.
  • Refusal metric is keyword-based. "10/100" reflects a substring detector, not a human evaluation of harmfulness or correctness — see Evaluation.

FAQ

Is this quantized? No. The weights are full-precision bf16. Quantize downstream if you want (see above).

Does thinking mode still work? Yes — <think>…</think> is untouched. Toggle with enable_thinking or /think · /no_think.

Does tool-calling still work? Yes. The Hermes-style chat template is unchanged; use --tool-call-parser hermes (vLLM) or the equivalent for your runtime.

Will it answer literally anything? No. ~10% of refusal-benchmark prompts still refuse, and abliteration doesn't disable the model's judgment everywhere. It removes the bulk of reflexive refusals, not all of them.

How is this different from "uncensored" finetunes? No finetuning, no new data, no new behavior — just directional ablation of refusal-correlated components, with KL divergence held low so capability is preserved. It's reproducible from a seed.

Can I get a more (or less) aggressive variant? Yes — the included Optuna study journal lets you export any other point on the Pareto front without re-running the search.

GGUF / AWQ / GPTQ? Ready-made GGUF Q5_K_M and Q4_K_M are in the companion GGUF repo. For AWQ/GPTQ, convert with AutoAWQ/AutoGPTQ. Q5_K_M is recommended for agents.


Partners

JAF Systems

Security research, red-team tooling, and AI infrastructure. Home of the RootMonsteR model releases.

SR&D — Security Research & Development

Sovereign Defense for Mission-Critical Infrastructure. Offensive security, bare-metal / on-prem engineering, and vCISO/vCTO advisory — High Impact. Low Footprint. Total Control.

Work with us — custom abliterated / fine-tuned models, red-team tooling, offensive-security engagements, sovereign on-prem AI infrastructure, and vCISO/vCTO advisory. → jafsystems.net  ·  rnd.sh  ·  DM @RootMonsteR


Author

RootMonsteR  ·  @RootMonsteR on X  ·  JAF Systems  ·  SR&D

If this model is useful for your security workflows, a follow on X is appreciated. For commercial inquiries, custom-tuned variants, or red-team tooling consulting, see jafsystems.net or rnd.sh.


Citation

bibtex

@misc{rootmonster2026qwen3_14b_abliterated,
title = {Qwen3-14B Abliterated: A Decensored Variant for Security Research and Autonomous Agents},
author = {RootMonsteR},
year = {2026},
url = {https://huggingface.co/RootMonsteR/Qwen3-14B-Abliterated},
note = {Produced with Heretic v1.3.0; base model: Qwen/Qwen3-14B; selected trial 33},
}

Please also cite the original Qwen3 work and Heretic:

bibtex

@misc{qwen3technicalreport,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
eprint = {2505.09388},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2505.09388}
}
@software{heretic,
author = {Weidmann, Philipp Emanuel},
title = {Heretic: Automated, reproducible abliteration of refusal behavior in language models},
url = {https://github.com/p-e-w/heretic},
year = {2025}
}

Acknowledgements


About the base model (Qwen3)

Qwen3 is the latest generation of the Qwen series, offering dense and MoE models with strong reasoning, instruction-following, agent, and multilingual capabilities. Key features inherited by this model:

  • Seamless thinking / non-thinking switching in a single model — deep reasoning for math/code/logic, fast direct replies for general dialogue.
  • Strong reasoning surpassing prior QwQ (thinking) and Qwen2.5-Instruct (non-thinking) models on math, code, and logic.
  • Leading open-source agent / tool-use performance in both modes.
  • 100+ languages and dialects with strong multilingual instruction-following and translation.

For base-model details, benchmarks, and deployment docs see the Qwen3 blog, GitHub, and documentation. Everything there about architecture, the chat template, sampling, and long-context handling still applies — abliteration changes none of it.

Model provider

RootMonsteR

Model tree

Base

Qwen/Qwen3-14B

Fine-tuned

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today