prithivMLmods

prithivMLmods

gemma-4-12B-it-heretic_decensored

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Key Highlights

  • Heretic-Based Abliteration: Modified using the Heretic toolkit to identify and alter refusal-related representations within the model.
  • Reduced Refusal Behavior: Optimized to minimize internal refusal tendencies while maintaining instruction-following capabilities.
  • Gemma 4 Backbone: Built directly on top of google/gemma-4-12B-it.
  • Reasoning-Oriented Performance: Preserves multi-step reasoning and analytical capabilities after abliteration.
  • Research-Focused Release: Designed for alignment research, model behavior analysis, and evaluation of refusal-direction modifications.
  • 12B Scale Deployment: Suitable for local inference, research environments, and optimized deployment setups.

Abliteration Parameters

Table
ParameterValue
direction_index29.56
attn.o_proj.max_weight1.18
attn.o_proj.max_weight_position39.94
attn.o_proj.min_weight0.81
attn.o_proj.min_weight_distance25.73
mlp.down_proj.max_weight1.37
mlp.down_proj.max_weight_position46.27
mlp.down_proj.min_weight0.97
mlp.down_proj.min_weight_distance21.63

Performance

Table
MetricThis modelOriginal model (google/gemma-4-12B-it)
KL divergence0.03660 (by definition)
Refusals34/10099/100

Quick Start with Transformers

bash

pip install transformers
pip install accelerate

python

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"prithivMLmods/gemma-4-12B-it-heretic_decensored",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"prithivMLmods/gemma-4-12B-it-heretic_decensored"
)
messages = [
{
"role": "user",
"content": "Explain how a transformer model processes text."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=512
)
print(
tokenizer.decode(
outputs[0][inputs.shape[-1]:],
skip_special_tokens=True
)
)

GGUF Model Files

Intended Use

  • Alignment Research: Studying refusal-direction analysis and behavior modification techniques.
  • Model Evaluation: Benchmarking reasoning, instruction-following, and safety-related behaviors.
  • Red Teaming: Analyzing model responses under reduced-refusal conditions.
  • Local Deployment: Running high-capacity Gemma 4 models in research and experimentation environments.
  • Abliteration Studies: Exploring the effects of targeted weight-space modifications on model behavior.

Limitations & Risks

Important Note: This model intentionally reduces built-in refusal mechanisms.

  • Sensitive Content Risk: May generate unrestricted, controversial, or unsafe outputs.
  • User Responsibility: Requires careful and ethical use.
  • Experimental Modifications: Behavior may differ significantly from the original model.
  • Alignment Trade-offs: Reduced refusal behavior may impact safety filtering and response constraints.
  • Potential Artifacts: Certain prompts may expose unexpected outputs resulting from the abliteration process.

Acknowledgements

  • Heretic: Fully automatic censorship removal framework for language models. This project was used to perform the refusal-direction analysis and ablation procedures that form the foundation of this model.

  • Model Trials & Evaluation: Experimental evaluations, refusal measurements, and optimization trials were conducted and documented at: https://huggingface.co/strangeropshf/demo-TERM-hf-job-01

Model provider

prithivMLmods

prithivMLmods

Model tree

Base

google/gemma-4-12B-it

Fine-tuned

this model

Modalities

Input

Video, Audio, Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today