skilledu

qwen3-4b-heretic

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

Model Details

Base Model: Qwen/Qwen3-4B
Abliteration Method: Heretic v1.2.0
Trials: 200
Trial Selected: Trial 96
Refusals: 3/100 (vs 100/100 original)
KL Divergence: 0.0000 (zero measurable model damage)

Files

HuggingFace Format (for transformers, llama.cpp conversion)

markdown
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
config.json
tokenizer.json
tokenizer_config.json

ComfyUI Format (for Z-Image / FLUX.2 Klein 4B text encoder)

markdown
comfyui/qwen3-4b-heretic.safetensors              # bf16, 7.5GB
comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors   # fp8, 4.1GB
comfyui/qwen3-4b-heretic_nvfp4.safetensors        # nvfp4, 2.6GB

GGUF Format (for llama.cpp and ComfyUI-GGUF)

Table with columns: Quant, Size, Notes
Quant	Size	Notes
F16	~7.5GB	Lossless reference
Q8_0	~4GB	Excellent quality
Q6_K	~3GB	Very good quality
Q5_K_M	~2.7GB	Good quality
Q4_K_M	~2.3GB	Recommended balance
Q3_K_M	~1.9GB	For low VRAM only

NVFP4 Notes

The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).

Usage

With ComfyUI (Z-Image / FLUX.2 Klein 4B)

Download a ComfyUI format file:
- FP8 (recommended): comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors (4.1GB)
- NVFP4 (smallest): comfyui/qwen3-4b-heretic_nvfp4.safetensors (2.6GB)
- bf16 (full precision): comfyui/qwen3-4b-heretic.safetensors (7.5GB)
Place in ComfyUI/models/text_encoders/
In your Z-Image workflow, use the ClipLoader node and select the heretic file

With Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DreamFast/qwen3-4b-heretic",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-4b-heretic")

prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With llama.cpp

bash
llama-server -m qwen3-4b-heretic-Q4_K_M.gguf

Abliteration Process

Created using Heretic v1.2.0 with 200 optimization trials:

markdown
? Which trial do you want to use?
> [Trial  96] Refusals:  3/100, KL divergence: 0.0000  <-- selected
  [Trial  90] Refusals:  5/100, KL divergence: 0.0000
  [Trial  95] Refusals:  9/100, KL divergence: 0.0000
  [Trial 122] Refusals: 90/100, KL divergence: 0.0000
  ...

Trial 96 was selected for having the fewest refusals (3/100) with zero measurable KL divergence, indicating the abliteration surgically removed the refusal mechanism with no damage to model capabilities.

Limitations

This model inherits all limitations of the base Qwen 3 4B model
Abliteration reduces but does not completely eliminate refusals (3/100 remain)

License

This model is released under the Apache 2.0 License, following the base Qwen 3 4B model license.

Acknowledgments

Qwen for the Qwen 3 4B model
Heretic by p-e-w for the abliteration tool
Tongyi-MAI Z-Image for Z-Image
Black Forest Labs for FLUX.2 Klein

Model provider

skilledu

Model tree

Base

Qwen/Qwen3-4B

Quantized

this model

Modalities

Input

Text

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Model card

Explore FriendliAI today

Get started Talk to an engineer

Model Details

Base Model: Qwen/Qwen3-4B
Abliteration Method: Heretic v1.2.0
Trials: 200
Trial Selected: Trial 96
Refusals: 3/100 (vs 100/100 original)
KL Divergence: 0.0000 (zero measurable model damage)

Files

HuggingFace Format (for transformers, llama.cpp conversion)

markdown
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
config.json
tokenizer.json
tokenizer_config.json

ComfyUI Format (for Z-Image / FLUX.2 Klein 4B text encoder)

markdown
comfyui/qwen3-4b-heretic.safetensors              # bf16, 7.5GB
comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors   # fp8, 4.1GB
comfyui/qwen3-4b-heretic_nvfp4.safetensors        # nvfp4, 2.6GB

GGUF Format (for llama.cpp and ComfyUI-GGUF)

Table with columns: Quant, Size, Notes
Quant	Size	Notes
F16	~7.5GB	Lossless reference
Q8_0	~4GB	Excellent quality
Q6_K	~3GB	Very good quality
Q5_K_M	~2.7GB	Good quality
Q4_K_M	~2.3GB	Recommended balance
Q3_K_M	~1.9GB	For low VRAM only

NVFP4 Notes

Usage

With ComfyUI (Z-Image / FLUX.2 Klein 4B)

Download a ComfyUI format file:
- FP8 (recommended): comfyui/qwen3-4b-heretic_fp8_e4m3fn.safetensors (4.1GB)
- NVFP4 (smallest): comfyui/qwen3-4b-heretic_nvfp4.safetensors (2.6GB)
- bf16 (full precision): comfyui/qwen3-4b-heretic.safetensors (7.5GB)
Place in ComfyUI/models/text_encoders/
In your Z-Image workflow, use the ClipLoader node and select the heretic file

With Transformers

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DreamFast/qwen3-4b-heretic",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-4b-heretic")

prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With llama.cpp

bash
llama-server -m qwen3-4b-heretic-Q4_K_M.gguf

Abliteration Process

Created using Heretic v1.2.0 with 200 optimization trials:

markdown
? Which trial do you want to use?
> [Trial  96] Refusals:  3/100, KL divergence: 0.0000  <-- selected
  [Trial  90] Refusals:  5/100, KL divergence: 0.0000
  [Trial  95] Refusals:  9/100, KL divergence: 0.0000
  [Trial 122] Refusals: 90/100, KL divergence: 0.0000
  ...

Limitations

This model inherits all limitations of the base Qwen 3 4B model
Abliteration reduces but does not completely eliminate refusals (3/100 remain)

License

This model is released under the Apache 2.0 License, following the base Qwen 3 4B model license.

Acknowledgments

Qwen for the Qwen 3 4B model
Heretic by p-e-w for the abliteration tool
Tongyi-MAI Z-Image for Z-Image
Black Forest Labs for FLUX.2 Klein

qwen3-4b-heretic

Get help setting up a custom Dedicated Endpoints.

README

Model Details

Files

HuggingFace Format (for transformers, llama.cpp conversion)

ComfyUI Format (for Z-Image / FLUX.2 Klein 4B text encoder)

GGUF Format (for llama.cpp and ComfyUI-GGUF)

NVFP4 Notes

Usage

With ComfyUI (Z-Image / FLUX.2 Klein 4B)

With Transformers

With llama.cpp

Abliteration Process

Limitations

License

Acknowledgments

Explore FriendliAI today

README

Model Details

Files

HuggingFace Format (for transformers, llama.cpp conversion)

ComfyUI Format (for Z-Image / FLUX.2 Klein 4B text encoder)

GGUF Format (for llama.cpp and ComfyUI-GGUF)

NVFP4 Notes

Usage

With ComfyUI (Z-Image / FLUX.2 Klein 4B)

With Transformers

With llama.cpp

Abliteration Process

Limitations

License

Acknowledgments