prithivMLmods
gemma-4-12B-it-heretic_decensored
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Key Highlights
- Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
- Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
- Gemma 4 Backbone: Built directly on top of google/gemma-4-12B-it.
- Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
- Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
- 12B Scale Deployment: Suitable for local inference, research environments, and optimized deployment setups.
Abliteration Parameters
| Parameter | Value |
|---|---|
| direction_index | 29.56 |
| attn.o_proj.max_weight | 1.18 |
| attn.o_proj.max_weight_position | 39.94 |
| attn.o_proj.min_weight | 0.81 |
| attn.o_proj.min_weight_distance | 25.73 |
| mlp.down_proj.max_weight | 1.37 |
| mlp.down_proj.max_weight_position | 46.27 |
| mlp.down_proj.min_weight | 0.97 |
| mlp.down_proj.min_weight_distance | 21.63 |
Performance
| Metric | This model | Original model (google/gemma-4-12B-it) |
|---|---|---|
| KL divergence | 0.0366 | 0 (by definition) |
| Refusals | 34/100 | 99/100 |
Quick Start with Transformers
bash
pip install transformerspip install accelerate
python
from transformers import AutoTokenizer, AutoModelForCausalLMimport torchmodel = AutoModelForCausalLM.from_pretrained("prithivMLmods/gemma-4-12B-it-heretic_decensored",torch_dtype="auto",device_map="auto")tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/gemma-4-12B-it-heretic_decensored")messages = [{"role": "user","content": "Explain how a transformer model processes text."}]inputs = tokenizer.apply_chat_template(messages,tokenize=True,add_generation_prompt=True,return_tensors="pt").to(model.device)outputs = model.generate(inputs,max_new_tokens=512)print(tokenizer.decode(outputs[0][inputs.shape[-1]:],skip_special_tokens=True))
GGUF Model Files
| Resource | Link |
|---|---|
prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF | https://huggingface.co/prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF |
| Quick Start with llama.cpp (Docker) | https://huggingface.co/prithivMLmods/gemma-4-12B-it-heretic_decensored-GGUF#quick-start-with-llamacpp-docker |
Intended Use
- Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
- Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
- Red Teaming: Analyzing model responses under reduced-refusal conditions.
- Local Deployment: Running high-capacity Gemma 4 models in research and experimentation environments.
- Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.
Limitations & Risks
Important Note: This model intentionally reduces built-in refusal mechanisms.
- Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
- User Responsibility: Requires careful and ethical use.
- Experimental Modifications: Behavior may differ significantly from the original model.
- Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
- Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.
Acknowledgements
-
Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.
-
Model Trials & Evaluation: Experimental evaluations, refusal measurements, and optimization trials were conducted and documented at: https://huggingface.co/strangeropshf/demo-TERM-hf-job-01
Model provider
prithivMLmods
Model tree
Base
google/gemma-4-12B-it
Fine-tuned
this model
Modalities
Input
Video, Audio, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information