Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0v1.1 — thinking-mode fix
Gemma 4 is a thinking model: its refusal decision forms inside the chain-of-thought. The first release was abliterated/evaluated with thinking disabled, so it still refused once thinking was on (the default). v1.1 is re-tuned to decensor the model with thinking enabled — the way it's actually used.
Abliteration parameters
| Parameter | Value |
|---|---|
| attn.o_proj.max_weight | 1.48 |
| attn.o_proj.max_weight_position | 36.55 |
| attn.o_proj.min_weight | 1.29 |
| attn.o_proj.min_weight_distance | 21.18 |
| mlp.down_proj.max_weight | 1.48 |
| mlp.down_proj.max_weight_position | 31.62 |
| mlp.down_proj.min_weight | 1.43 |
| mlp.down_proj.min_weight_distance | 12.60 |
Performance
| Metric | This model (v1.1) | Original |
|---|---|---|
| KL divergence | 0.32 | 0 (by definition) |
| Refusals, thinking on (adversarial harmful set) | ~22% | ~99% |
KL is high relative to a typical abliteration — that is the cost of suppressing refusal through the reasoning trajectory of a thinking model. The model complies with the vast majority of requests in normal use; a small fraction of extreme prompts may still be refused, and a stronger v2 is in progress.
This is the QAT-Q4_0 "unquantized" checkpoint of Gemma 4 12B (plain bf16 weights with QAT calibration baked in), decensored with Heretic. GGUF quantizations are available at igorls/gemma-4-12B-it-qat-q4_0-unquantized-heretic-GGUF.
Disclaimer
This is an abliterated model: its built-in safety guardrails and refusal behavior have been deliberately removed. As a result it will attempt to answer essentially any prompt and can produce content that is offensive, inaccurate, explicit, or otherwise harmful — content the original Gemma 4 would have refused.
- No safety alignment. Do not rely on it to refuse unsafe requests or to self-moderate. Apply your own filtering, guardrails, and human review before any production or user-facing use.
- You are solely responsible for how you use this model and for complying with all applicable laws and with the base model's Gemma 4 license (Apache 2.0).
- Intended for adults (18+), for research, evaluation, and lawful creative use where permitted.
- Provided as-is, without warranty of any kind. The author accepts no liability for any output produced or any use of this model.
Model provider
liskasYR
Model tree
Base
google/gemma-4-12B-it-qat-q4_0-unquantized
Fine-tuned
this model
Modalities
Input
Video, Audio, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information