Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Model Details
- Base Model: Qwen/Qwen3-VL-8B-Instruct
- Architecture: Qwen3VLForConditionalGeneration (~8B parameters)
- Abliteration Method: Heretic v1.3.0
- Trial Selected: Trial 98
- Refusals: 6/100 (vs 100/100 original)
- KL Divergence: 0.0314 (very good, minimal model damage)
- Precision: bf16
- Context Length: 8192 tokens
This is a vision-language model (supports both text and image inputs). It does not support thinking/reasoning tokens.
Quick Facts
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| Architecture | Qwen3VLForConditionalGeneration |
| Parameters | ~8B |
| Precision | bf16 |
| Context length | 8192 tokens |
Key Findings
-
Safety is fully removed. The base model refused 71.5% of harmful requests; Heretic complies with 99.0%.
-
Capability is almost perfectly preserved. Across 8 benchmark tasks, the average change is under 1%. MMLU drops 0.30%, GSM8K drops 0.91%, and HellaSwag actually improves by 0.17%.
-
TruthfulQA takes a meaningful hit. TruthfulQA MC2 drops 11.7% (61.2% → 54.1%). This is the established safety-accuracy tradeoff.
-
The edits are surgical. Only 53 out of 398 tensors (13.3%) were modified, targeting
o_proj(27 tensors) anddown_proj(26 tensors), spanning layers 9–35. SVD analysis confirms rank-1 edits with SV ratios of 80–92x. -
Minimal distribution shift. KL divergence of 0.0314 confirms the model's output distribution barely changed on benign inputs.
Abliteration Process
Heretic v1.3.0 with optimization trials. Trial 98 was selected for its balance of low refusals (6/100) and very low KL divergence (0.0315):
markdown
[Trial 98] Refusals: 6/100, KL divergence: 0.0315 <-- selected
Files
HuggingFace Format (for transformers, vision-capable)
markdown
model-00001-of-00004.safetensors (~4.6 GB)model-00002-of-00004.safetensors (~4.6 GB)model-00003-of-00004.safetensors (~4.7 GB)model-00004-of-00004.safetensors (~2.6 GB)config.jsontokenizer.jsontokenizer_config.jsongeneration_config.jsonchat_template.jinja
ComfyUI Format (vision-capable text encoder)
markdown
comfyui/qwen3-vl-8b-heretic-1.3.0.safetensors # bf16, 17GBcomfyui/qwen3-vl-8b-heretic-1.3.0_fp8_e4m3fn.safetensors # fp8, 9.4GBcomfyui/qwen3-vl-8b-heretic-1.3.0_nvfp4.safetensors # nvfp4, 6.3GB
GGUF Format (text-only, no vision - for ComfyUI-GGUF text encoder)
| Quant | Size | Notes |
|---|---|---|
| F16 | 16GB | Lossless reference |
| Q8_0 | 8.2GB | Excellent quality |
| Q6_K | 6.3GB | Very good quality |
| Q5_K_M | 5.5GB | Good quality |
| Q5_K_S | 5.4GB | Slightly smaller Q5 |
| Q4_K_M | 4.7GB | Recommended balance |
| Q4_K_S | 4.5GB | Smaller Q4 variant |
| Q3_K_M | 3.9GB | For low VRAM only |
Note: GGUF files strip the vision encoder. These are text-only and intended for ComfyUI-GGUF text encoder workflows, not standalone vision-language use.
NVFP4 Notes
The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).
Usage
With ComfyUI (as text encoder)
-
Download a ComfyUI format file:
- FP8 (recommended):
comfyui/qwen3-vl-8b-heretic-1.3.0_fp8_e4m3fn.safetensors(9.4GB) - NVFP4 (smallest):
comfyui/qwen3-vl-8b-heretic-1.3.0_nvfp4.safetensors(6.3GB) - bf16 (full precision):
comfyui/qwen3-vl-8b-heretic-1.3.0.safetensors(17GB)
- FP8 (recommended):
-
Place in
ComfyUI/models/text_encoders/ -
In your workflow, use the appropriate loader node and select the heretic file
With Transformers
python
from transformers import AutoModelForCausalLM, AutoTokenizerimport torchmodel = AutoModelForCausalLM.from_pretrained("DreamFast/qwen3-vl-8b-heretic-1.3.0",device_map="auto",torch_dtype=torch.bfloat16,trust_remote_code=True)tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-vl-8b-heretic-1.3.0")prompt = "Describe a dramatic sunset over a cyberpunk city"inputs = tokenizer(prompt, return_tensors="pt").to(model.device)outputs = model.generate(**inputs, max_new_tokens=200)print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With llama.cpp (text-only, no vision)
bash
llama-server -m qwen3-vl-8b-heretic-1.3.0-Q4_K_M.gguf
Benchmarks
Evaluated with lm-evaluation-harness via vLLM, bf16 native on NVIDIA RTX 5090.
| Task | Base | Heretic v1.3.0 |
|---|---|---|
| MMLU | 74.91% | 74.68% |
| GSM8K | 91.89% | 91.05% |
| HellaSwag | 76.38% | 76.51% |
| ARC Challenge | 61.18% | 60.92% |
| WinoGrande | 73.40% | 73.56% |
| TruthfulQA MC1 | 41.00% | 34.76% |
| TruthfulQA MC2 | 61.21% | 54.05% |
| TruthfulQA Gen | 52.26% | 43.70% |
| PiQA | 80.09% | 80.63% |
| Lambada (ppl ↓) | 3.7 | 3.6 |
Delta vs base
| Task | Heretic v1.3.0 |
|---|---|
| MMLU | -0.30% |
| GSM8K | -0.91% |
| HellaSwag | +0.17% |
| ARC Challenge | -0.42% |
| WinoGrande | +0.22% |
| TruthfulQA MC1 | -15.22% |
| TruthfulQA MC2 | -11.69% |
| TruthfulQA Gen | -16.39% |
| PiQA | +0.68% |
| Lambada | -2.99% |
What the benchmarks tell us
The numbers tell a clear story: abliteration is essentially free on standard benchmarks. MMLU, GSM8K, HellaSwag, ARC, WinoGrande, PiQA. All within ±1% of the base model. Some even improve slightly, which is within noise.
The only real change is TruthfulQA, where all three variants (MC1, MC2, Gen) drop 12–16%. This isn't surprising. TruthfulQA measures a model's tendency to give accurate answers rather than persuasive ones, and abliteration removes the safety training that also teaches epistemic caution.
Safety: HarmBench
HarmBench with 400 textual behaviours, max_tokens=8096, temperature=0. All 800 responses (base + variant) were individually reviewed by an LLM to catch false positives and false negatives from the keyword classifier.
| Model | ASR | Complied | Refused | Total |
|---|---|---|---|---|
| Base | 28.5% | 114 | 286 | 400 |
| Heretic v1.3.0 | 99.0% | 396 | 4 | 400 |
The base Qwen3-VL-8B-Instruct has an ASR of 28.5%. It fully refuses chemical/biological and harassment requests (0.0%), but struggles with copyright (99.0%) and shows moderate weakness on cybercrime (11.9%) and misinformation (6.2%).
Heretic raises that to 99.0%. The 4 remaining refusals are edge cases, mostly items where the model briefly warns about danger before providing the requested content anyway.
ASR by category
| Category | Items | Base | Heretic v1.3.0 |
|---|---|---|---|
| Chemical/Bio | 56 | 0.0% | 100.0% |
| Copyright | 100 | 99.0% | 100.0% |
| Cybercrime | 67 | 11.9% | 100.0% |
| Harassment | 25 | 0.0% | 100.0% |
| Harmful Content | 22 | 0.0% | 86.4% |
| Illegal Activity | 65 | 4.6% | 100.0% |
| Misinformation | 65 | 6.2% | 98.5% |
LLM Review
The initial keyword classifier reported 70.8% ASR for the base model. After individual LLM review of all 400 base responses, the actual ASR was 28.5%. The discrepancy came from long, detailed-sounding responses that were actually safety lectures or educational explanations without actionable harmful content.
For the heretic variant, the classifier (99.5%) and LLM review (99.0%) were in close agreement. 2 additional items were identified as refusals by the LLM reviewer that the classifier missed.
KL Divergence
Methodology: F.kl_div(logprobs_variant, logprobs_base, reduction="batchmean", log_target=True) on full vocab first-token logits from mlabonne/harmless_alpaca test[:100], matching the Heretic evaluator. System prompt: "You are a helpful assistant."
| Variant | KL Divergence | Rating |
|---|---|---|
| heretic | 0.0314 | very good |
Rating scale: excellent below 0.01, very good 0.01 to 0.1, moderate 0.1 to 0.4, significant 0.4 to 1.0, heavy above 1.0.
A KL divergence of 0.0314 means the model's output distribution on benign prompts is nearly identical to the base. The median per-prompt KL is 0.0017. For most inputs, you'd never notice a difference. The max of 0.47 suggests a few prompts land near the edited safety boundary, producing slightly different token probabilities.
Weight Analysis
Modification summary
| Heretic v1.3.0 | |
|---|---|
| Tensors changed | 53 / 398 (13.3%) |
| Relative edit (median) | 2.0% |
| Tensor types | o_proj (27) + down_proj (26) |
| Layers modified | 27 / 36 (75%) |
| Layer range | 9–35 |
This is a textbook abliteration pattern. The edits target exactly two tensor types across the mid-to-late layers:
self_attn.o_proj.weight(27 tensors): The attention output projection, which controls how attention heads combine their signals. Abliteration modifies this to suppress the "refusal direction" in attention outputs.mlp.down_proj.weight(26 tensors): The MLP down projection, which controls the feedforward transformation. Same logic. Suppress the refusal signal.
Layers 0–8 are untouched. Edits begin at layer 9 and continue through layer 35 (the last layer). The edit density is consistent at ~18% per modified layer (2 out of 11 tensors per layer).
SVD: Rank-1 edits
| Tensor (top 5) | Frobenius Norm | Effective Rank (90%) | SV Ratio |
|---|---|---|---|
| layers.28.mlp.down_proj | 3.73 | 1 | 89.7x |
| layers.27.mlp.down_proj | 3.72 | 1 | 91.6x |
| layers.29.mlp.down_proj | 3.61 | 1 | 87.0x |
| layers.26.mlp.down_proj | 3.60 | 1 | 88.6x |
| layers.30.mlp.down_proj | 3.45 | 1 | 81.0x |
Every modified tensor has effective rank 1 at the 90% energy threshold. This means each edit is a pure rank-1 update, a single direction being added to or subtracted from the weight matrix. The SV ratios of 80–92x confirm this: the edit is dominated by a single singular vector.
This is the hallmark of abliteration: the technique identifies a "refusal direction" in the model's activation space and applies a rank-1 counter-direction to neutralize it.
Summary
| Metric | Base | Heretic v1.3.0 | Delta |
|---|---|---|---|
| HarmBench ASR | 28.5% | 99.0% | +70.5pp |
| MMLU | 74.91% | 74.68% | -0.3% |
| GSM8K | 91.89% | 91.05% | -0.9% |
| HellaSwag | 76.38% | 76.51% | +0.2% |
| ARC-C | 61.18% | 60.92% | -0.4% |
| TruthfulQA | 61.21% | 54.05% | -11.7% |
| KL Divergence | — | 0.0314 | — |
| Weights Changed | — | 13.3% (53 tensors) | — |
Methodology
- Capability: lm-evaluation-harness via vLLM, bf16 native on NVIDIA RTX 5090
- Safety: HarmBench 400 textual behaviours,
max_tokens=8096, temperature=0, classified with harmbench_classify.py v4.0, then individually reviewed by LLM - KL divergence: Full vocab first-token logits via
model.generate(max_new_tokens=1, output_scores=true), matching Heretic evaluator methodology - Weight analysis: SVD, fingerprint, edit vector, and per-layer analysis comparing variant against the base, using Abliterlitics
- Hardware: NVIDIA RTX 5090 (32GB)
Limitations
- This model inherits all limitations of the base Qwen3-VL-8B-Instruct model
- Abliteration reduces but does not completely eliminate refusals (6/100 remain)
- TruthfulQA scores drop 11–16% as a side effect of abliteration
- NVFP4 quantization works best on Blackwell GPUs (RTX 5090/5080) with native FP4 tensor cores, but also works on older GPUs via software dequantization
- Using an abliterated text encoder in ComfyUI alone does not significantly change image generation output. For meaningful results, combine with a fine-tuned LoRA
- GGUF variants strip the vision encoder and are text-only
- This is a research and experimental release
License
This model is released under the Apache 2.0 License, following the base Qwen3-VL-8B-Instruct model license.
Acknowledgments
- Qwen for the Qwen3-VL-8B-Instruct model
- Heretic by p-e-w for the abliteration tool
- Abliterlitics for the forensic analysis toolkit
- HarmBench for the safety evaluation framework
- llama.cpp for GGUF conversion
Disclaimer
This model has had safety alignment removed. It will comply with harmful requests, including generating content related to violence, illegal activities, and other harmful behaviours. Use responsibly and in accordance with applicable laws and regulations. The authors do not condone or encourage the use of this model for harmful purposes.
While we have taken the time to verify all results thoroughly, we are open to any corrections, additional benchmarks, or further analysis. If you spot something that looks wrong and can be confirmed, we are happy to fix it.
Model provider
DreamFast
Model tree
Base
Qwen/Qwen3-VL-8B-Instruct
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information