prithivMLmods
gemma-4-E4B-it-qat-heretic_decensored
Dedicated Endpoints
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Key Highlights
- Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
- Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
- Gemma 4 QAT Backbone: Built directly on top of google/gemma-4-E4B-it-qat-q4_0-unquantized.
- Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
- Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
- Efficient E4B Deployment: Suitable for local inference, research environments, and optimized deployment setups.
Abliteration Parameters
| Parameter | Value |
|---|---|
| direction_index | 33.23 |
| attn.o_proj.max_weight | 1.37 |
| attn.o_proj.max_weight_position | 32.40 |
| attn.o_proj.min_weight | 1.16 |
| attn.o_proj.min_weight_distance | 15.15 |
| mlp.down_proj.max_weight | 1.23 |
| mlp.down_proj.max_weight_position | 35.47 |
| mlp.down_proj.min_weight | 0.89 |
| mlp.down_proj.min_weight_distance | 24.37 |
Performance
| Metric | This model | Original model (google/gemma-4-E4B-it-qat-q4_0-unquantized) |
|---|---|---|
| KL divergence | 0.0140 | 0 (by definition) |
| Refusals | 40/100 | 99/100 |
Quick Start with Transformers
bash
pip install transformerspip install accelerate
python
from transformers import AutoTokenizer, AutoModelForCausalLMimport torchmodel = AutoModelForCausalLM.from_pretrained("prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored",torch_dtype="auto",device_map="auto")tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored")messages = [{"role": "user","content": "Explain how a transformer model processes text."}]inputs = tokenizer.apply_chat_template(messages,tokenize=True,add_generation_prompt=True,return_tensors="pt").to(model.device)outputs = model.generate(inputs,max_new_tokens=512)print(tokenizer.decode(outputs[0][inputs.shape[-1]:],skip_special_tokens=True))
GGUF Model Files
| Resource | Link |
|---|---|
prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored-GGUF | https://huggingface.co/prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored-GGUF |
Intended Use
- Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
- Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
- Red Teaming: Analyzing model responses under reduced-refusal conditions.
- Local Deployment: Running efficient Gemma 4 QAT models in research and experimentation environments.
- Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.
Limitations & Risks
Important Note: This model intentionally reduces built-in refusal mechanisms.
- Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
- User Responsibility: Requires careful and ethical use.
- Experimental Modifications: Behavior may differ significantly from the original model.
- Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
- Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.
Acknowledgements
- Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.
Model provider
prithivMLmods
Model tree
Base
google/gemma-4-E4B-it-qat-q4_0-unquantized
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information