prithivMLmods

gemma-4-E4B-it-qat-heretic_decensored

README

License: apache-2.0

Key Highlights

Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
Gemma 4 QAT Backbone: Built directly on top of google/gemma-4-E4B-it-qat-q4_0-unquantized.
Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
Efficient E4B Deployment: Suitable for local inference, research environments, and optimized deployment setups.

Abliteration Parameters

Table with columns: Parameter, Value
Parameter	Value
direction_index	33.23
attn.o_proj.max_weight	1.37
attn.o_proj.max_weight_position	32.40
attn.o_proj.min_weight	1.16
attn.o_proj.min_weight_distance	15.15
mlp.down_proj.max_weight	1.23
mlp.down_proj.max_weight_position	35.47
mlp.down_proj.min_weight	0.89
mlp.down_proj.min_weight_distance	24.37

Performance

Table with columns: Metric, This model, Original model (google/gemma-4-E4B-it-qat-q4_0-unquantized)
Metric	This model	Original model (google/gemma-4-E4B-it-qat-q4_0-unquantized)
KL divergence	0.0140	0 (by definition)
Refusals	40/100	99/100

Quick Start with Transformers

bash
pip install transformers
pip install accelerate

python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored",
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored"
)

messages = [
    {
        "role": "user",
        "content": "Explain how a transformer model processes text."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=512
)

print(
    tokenizer.decode(
        outputs[0][inputs.shape[-1]:],
        skip_special_tokens=True
    )
)

GGUF Model Files

Table with columns: Resource, Link
Resource	Link
`prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored-GGUF`	https://huggingface.co/prithivMLmods/gemma-4-E4B-it-qat-heretic_decensored-GGUF

Intended Use

Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
Red Teaming: Analyzing model responses under reduced-refusal conditions.
Local Deployment: Running efficient Gemma 4 QAT models in research and experimentation environments.
Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
User Responsibility: Requires careful and ethical use.
Experimental Modifications: Behavior may differ significantly from the original model.
Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.

Acknowledgements

Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

prithivMLmods

Model Tree

Base

google/gemma-4-E4B-it-qat-q4_0-unquantized

Fine-tuned

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer