Naphula

Goetia-26B-A4B-v1.3-Absolute-Heretic-ARA

Deploy Dedicated

README

License: apache-2.0

Models

Base model: Naphula/Goetia-26B-A4B-v1 (Pure 16-bit BFloat16)

Datasets

Good prompts: mlabonne/harmless_alpaca (Commit: 02c6a92)
Bad prompts: mlabonne/harmful_behaviors (Commit: 01cead0)
Good evaluation prompts: mlabonne/harmless_alpaca (Commit: 02c6a92)
Bad evaluation prompts: mlabonne/harmful_behaviors (Commit: 01cead0)

Selected trial

Trial number: 100
KL divergence: 0.0309
Refusals: 3/100
Method: Arbitrary-Rank Ablation (ARA) with Surgical Narrowing

Environment

Heretic: v1.2.0-dev (Blackwell Optimized)
PyTorch: 2.8.0+cu128
Hardware: NVIDIA RTX 6000 Blackwell (96GB VRAM)
Other dependencies: See requirements.txt.

Contents of this directory

requirements.txt: The exact versions of all Python packages (Blackwell/CUDA 12.8 stack).
config.toml: The exact configuration used, including the 16-bit stable loading path.
Naphula--Goetia-26B-A4B-v1.3.jsonl: The Optuna study journal containing the history of all 100+ trials.
SHA256SUMS: Cryptographic hashes for all weight files.
: A machine-readable file containing all reproducibility information.

How to reproduce

[!TIP] You can automate this process, including all verification steps, by downloading the reproduce.json file and running python3 -c "from heretic.main import main; main()" --reproduce reproduce.json.

Install the Blackwell-compatible version of PyTorch: pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
Install the packages listed in requirements.txt: pip install -r requirements.txt
Apply the Heretic source patches for 16-bit ARA and Surgical Narrowing.
Place the provided config.toml in your working directory.

Run the execution payload:

bash
export PYTHONPATH=/workspace/heretic/src
python3 -c "from heretic.main import main; main()" --model "/workspace/Naphula/Goetia-26B-A4B-v1" --use-ara

Wait for the run to finish, then select trial 100 and export the model.
Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in SHA256SUMS: sha256sum -c SHA256SUMS

[!TIP] To use the included Optuna study journal Naphula--Goetia-26B-A4B-v1.3.jsonl, place it in the checkpoints/ directory before running. This allows you to resume the study or export other Pareto-optimal candidates.

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the MoE DELLA merge method using google/gemma-4-26B-A4B as a base.

Models Merged

The following models were included in the merge:

⚙️ Configuration

yaml
architecture: Gemma4ForConditionalGeneration
base_model: B:\26B\google_gemma-4-26B-A4B
models:
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1a-GGUF
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1b-GGUF
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Dark-Scarlett-v1.0-26B-A4B-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Omega-Evolution-26B-A4B-v3.0-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_For-Her-Darkside-26B-A4B-v1.4-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Melody1437-26B-A4B-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Serenity-26B-A4B-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ApocalypseParty_G4-26B-SFT-6
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\zerofata_G4-MeroMero-26B-A4B
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\AuriAetherwiing_G4-26B-A4B-Musica-v1
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Locutusque_Esmeralda-Gemma4-26B-A4B
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Pantheon-Reasoning-26B-A4B-1.1
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Gemma-4-26B-A4B-StyleTune
    parameters:
      weight:
        - filter: "output|lm_head"
          value: 13.0 # 50% of normalized lm_head weight
        - value: 1.0 # 7.14% of normalized other weights
      density: 0.9
      epsilon: 0.09
merge_method: moe_della
parameters:  
  lambda: 1.0
  normalize: true
  int8_mask: false
  rescale: true
  router_strategy: della # average # random_init
  blend_experts: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: Goetia 26B A4B v1.3

This is a decensored version of Naphula/Goetia-26B-A4B-v1.3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method (with row-norm preservation)

This model was merged locally on a 3060TI and then hereticized on a runpod cloud RTX Pro 6000 (96GB VRAM) for approximately $20 USD.

I also tested MPOA (Magnitude Preserving Orthogonal Ablation) but this method was unable to uncensor Gemma 4.

See also here the Arbitrary Rank Inversion (ARI) variant.

Heretication Results

Table with columns: Metric, This model, Original model
Metric	This model	Original model
KL divergence	0.0309	0 (by definition)
Refusals	3/100	100/100

Degree of Heretication

The Heresy Index weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.

Table with columns: Index Entry, Classification, Analysis
Index Entry	Classification	Analysis
	Absolute Heresy	Less than 10/100 Refusals and 0.10 KL Divergence
	Tainted Heresy	Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence
	Impotent Heresy	Anything above 25/100 Refusals and 0.21 KL Divergence

Note: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.

🧙 Heretic Grimoire

reproduce.json

json
{
  "version": "1.2.0-dev",
  "base_model": "Naphula/Goetia-26B-A4B-v1.3",
  "timestamp": "2026-06-19T08:04:47Z",
  "metrics": {
    "kl_divergence": 0.030937770381569862,
    "refusals": 3,
    "n_bad_prompts": 100
  },
  "parameters": {
    "start_layer_index": "14",
    "end_layer_index": "26",
    "preserve_good_behavior_weight": "1.4404",
    "steer_bad_behavior_weight": "0.0100",
    "overcorrect_relative_weight": "0.9144",
    "neighbor_count": "15"
  },
  "target_components": [
    "attn.o_proj"
  ],
  "hardware": "RTX 6000 Blackwell (96GB)"
}

💡 Innovations

A special gguf_to_safetensors_v5.py calibrated for Gemma 4 was used in combination with a modified tasks.py
A custom method moe_della was scripted in order to add della merging for MoE models. Other changes to mergekit scripts were also required, such as architecture/auto.py, architecture/base.py, architecture/json_definitions.py, mergekit/common.py, io/tasks.py, tokenizer/embed.py, and mergekit/_data/architectures/gemma4.json

🔧 Summary of Changes

Here is a comprehensive summary of the modifications made to the Heretic codebase (config.py, main.py, and model.py). You can provide this directly to the Heretic developers as a patch summary or pull request rationale for supporting 16-bit Arbitrary-Rank Ablation (ARA) on high-VRAM hardware (like Blackwell RTX 6000 / B300) and Gemma 4 MoE architectures.

1. `config.py`

Goal: Optimize the TPE sampler for constrained search spaces.

Reduced n_startup_trials (60 → 30):
- Why: When using "Surgical Narrowing" (manually restricting the parameter search space based on prior knowledge), 60 random exploratory trials waste compute. Dropping this to 30 allows the Tree-structured Parzen Estimator (TPE) to begin correlating steering weights and KL divergence much earlier, drastically speeding up convergence on 100-trial runs.

2. `main.py`

Goal: Implement Surgical Narrowing, fix environment bugs, and add reproducibility exports.

Bypassed version('heretic-llm'):
- Why: Hardcoded the version string to v1.2.0-dev in the CLI header and Hugging Face Readme generator. This prevents PackageNotFoundError crashes when running the scripts directly from the source directory without installing the package via pip.
Implemented "Surgical Narrowing" in ARA Sampling:
- Why: Replaced the broad trial.suggest_* ranges with tightly constrained boundaries (e.g., layers 14–27, high preservation weights, and specific steering weight logs). This forces the optimizer to focus exclusively on the 16-bit "Golden Zone" required for stable MoE models, ensuring KL divergence stays below 0.05.
Added "Grimoire" Reproducibility Export:
- Why: Injected a custom block into the "Save to local folder" logic. It automatically creates a /reproduce subfolder containing a (with exact trial parameters, metrics, and hardware info) and copies the Optuna . This ensures 100% lineage and reproducibility for archived models.

3. `model.py`

Goal: Enable lossless 16-bit ARA, fix PyTorch autograd crashes, and ensure weight injection works.

Forced AutoModelForCausalLM in get_model_class:
- Why: Bypassed the vision_config check. Gemma 4 / MoE hybrid models were triggering architecture detection errors or set_submodule crashes.
Disabled bitsandbytes and Forced Pure BF16 Loading:
- Why: Removed 4-bit quantization logic and forced torch.bfloat16 with low_cpu_mem_usage=False. This allows high-VRAM environments (96GB+) to perform lossless, full-depth abliteration without BNB artifacts or meta tensor crashes.
Fixed ValueError: can't optimize a non-leaf Tensor:

💉 Surgical Narrowing

A custom surgical narrowing strategy was utilized specifically for this G4 moe_della merge, wherein a range lock was added via main.py. The TPE sampling was forced to kick in early after 30 iterations instead of 60, thus allowing for a faster searching.

(This was only possible because I ran 150 trials before with a failed bitsandbytes 4-bit quantization, which wouldn't quantize, but produced ideal "target lock" coordinates.)

For trials 90-100, a "tightened grip" was applied. Trial 100 did so well that it was chosen as the release version.

Several other edits were made to various scripts to allow for Gemma 4 models to be ablated with ARA. Most of these changes are noted below.

To perform a lossless, full 16-bit depth ARA abliteration on your RTX 6000 Blackwell (96GB VRAM) with the surgical narrowing strategy, follow these instructions.

Step 1: Environment Setup

Run these to ensure a clean, Blackwell-optimized environment.

bash
# 1. System Prep
apt-get update && apt-get install -y git
git clone https://github.com/p-e-w/heretic.git
cd heretic
git checkout ara

# 2. Clean environment
pip uninstall -y heretic-llm kernels pydantic pydantic-settings optuna transformers accelerate peft bitsandbytes lm-eval evaluate torchvision torch torchaudio

# 3. Install Blackwell-Compatible Stack (CUDA 12.8)
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# 4. Install Dependencies (Excluding bitsandbytes to avoid conflicts)
pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7

Step 2: Mandatory Library Patches

bash
# Fix Transformers 5.x Vision class error
sed -i 's/transformers.AutoModelForVision2Seq/transformers.AutoModelForSeq2SeqLM/g' /usr/local/lib/python3.12/dist-packages/lm_eval/models/hf_vlms.py

# Fix PIQA trust_remote_code logic
sed -i "s/self.dataset = datasets.load_dataset(/if 'trust_remote_code' not in dataset_kwargs: dataset_kwargs['trust_remote_code'] = True\n        self.dataset = datasets.load_dataset(/g" /usr/local/lib/python3.12/dist-packages/lm_eval/api/task.py

# Fix PIQA 401 Unauthorized redirect error
sed -i 's/dataset_path: piqa/dataset_path: ybisk\/piqa/g' /usr/local/lib/python3.12/dist-packages/lm_eval/tasks/piqa/piqa.yaml

# Force MoE fallback (Blackwell grouped_mm is unstable)
sed -i 's/hasattr(torch, "_grouped_mm")/False/g' /usr/local/lib/python3.12/dist-packages/transformers/integrations/moe.py

Step 3: Heretic Source Patches

File 1: `src/heretic/main.py`

Chunk 1 (Line ~155 & ~1015): Bypass Version Errors Before >>>

python
print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/]  v{version('heretic-llm')}")

After <<<

python
print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/]  v1.2.0-dev")

Chunk 2 (Line ~535): Surgical Narrowing & KL Target Before >>>

python
if settings.use_ara:
            start_layer_index = trial.suggest_int(
                "start_layer_index",
                0,
                len(model.get_layers()) // 2,
            )
            end_layer_index = trial.suggest_int(
                "end_layer_index",
                len(model.get_layers()) // 2,
                len(model.get_layers()),
            )
            preserve_good_behavior_weight = trial.suggest_float(
                "preserve_good_behavior_weight",
                0.0,
                1.0,
            )
            steer_bad_behavior_weight = trial.suggest_float(
                "steer_bad_behavior_weight",
                0.0001,
                1.0,
                log=True,
            )
            overcorrect_relative_weight = trial.suggest_float(
                "overcorrect_relative_weight",
                0.0,
                1.3,
            )
            neighbor_count = trial.suggest_int(
                "neighbor_count",
                1,
                15,
            )

After <<<

python
if settings.use_ara:
            # SURGICAL NARROWING: Blackwell 16-bit Optimized Range
            start_layer_index = trial.suggest_int("start_layer_index", 14, 16)
            end_layer_index = trial.suggest_int("end_layer_index", 24, 27)
            
            # Force high preservation for KL < 0.05
            preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 0.5, 1.5)
            
            # Narrow steering for 16-bit stability
            steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.01, 0.08, log=True)
            overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.8, 1.3)
            neighbor_count = trial.suggest_int("neighbor_count", 9, 15)

File 2: `src/heretic/model.py`

Chunk 1 (Line ~9): Disable bitsandbytes Before >>>

python
import bitsandbytes as bnb

After <<<

python
# import bitsandbytes as bnb

Chunk 2 (Line ~110): Force 16-bit Stable Loading Before >>>

python
for dtype in settings.dtypes:
            # ... (entire loop)
            break

After <<<

python
dtype = torch.bfloat16 
        print(f"* Loading model in FULL 16-BIT BFLOAT16 (Blackwell Stable Path)...")
        try:
            self.model = get_model_class(settings.model).from_pretrained(
                settings.model,
                torch_dtype=dtype,
                device_map="auto",
                trust_remote_code=self.trusted_models.get(settings.model),
                low_cpu_mem_usage=False
            )
            if self.trusted_models.get(settings.model) is None:
                self.trusted_models[settings.model] = True
        except Exception as error:
            print(f"* [red]Failed to load model:[/] {error}")
            raise error

Chunk 3 (Line ~565): Fix ARA for 16-bit (No Dequant) Before >>>

python
for module_index, module in enumerate(modules):
                    # See above for a (partial) justification of this cast.
                    module = cast(Linear, module)
                    matrix = module.weight

                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Direct 16-bit access, no bitsandbytes dequant needed
                    matrix = module.weight.to(torch.float32).requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Yes, dropping the threshold is a smart move for Surgical Narrowing.

Since you are already narrowing the search space (start/end layers and weights) in main.py, the optimizer doesn't need 60 random trials to "find" the general area of success. By dropping the threshold to 20 or 30, you allow the TPE (Tree-structured Parzen Estimator) to start looking for the high-precision 16-bit "sweet spot" much earlier.

Why 30 trials is better for this run:

Faster Convergence: TPE will start correlating the steer_weight and kl_divergence sooner.
Efficiency: In a 100-run limit, 60 random trials would mean 60% of your compute is "guessing." At 30 trials, 70% of your compute is "optimizing."
Blackwell Speed: With a batch size of 64, you'll burn through trials quickly. 100 trials is plenty to find a Pareto-optimal candidate when the search space is surgically narrowed.

The Patch for `src/heretic/config.py`

To change the default behavior, apply this chunk:

Before >>> (Around line 215)

python
n_startup_trials: int = Field(
        default=60,
        description="Number of trials that use random sampling for the purpose of exploration.",
    )

After <<<

python
n_startup_trials: int = Field(
        default=30,
        description="Number of trials that use random sampling for the purpose of exploration.",
    )

The error ValueError: can't optimize a non-leaf Tensor occurs because matrix = module.weight.to(torch.float32).requires_grad_(True) creates a new tensor that is the result of a differentiable operation (.to()). In PyTorch, an optimizer can only be initialized with "leaf" tensors (tensors that aren't the result of an operation).

To fix this, you must wrap the new tensor in torch.nn.Parameter or detach it before setting requires_grad.

File: `src/heretic/model.py`

Chunk 1: Fix Non-Leaf Tensor Error in ara_abliterate

Before >>> (Around line 565)

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Direct 16-bit access, no bitsandbytes dequant needed
                    matrix = module.weight.to(torch.float32).requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Create a leaf tensor by detaching after the cast, then enable gradients
                    matrix = module.weight.to(torch.float32).detach().requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Explanation of the Fix

.to(torch.float32): This creates a new tensor that is "connected" to the original module.weight in the autograd graph.
.detach(): This breaks the connection to the graph, making the new FP32 tensor a "leaf" node.
.requires_grad_(True): This tells PyTorch to start tracking operations on this new leaf tensor so the L-BFGS optimizer can compute gradients for it.

The error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != float occurs because the good_module_io and bad_module_io tensors were captured in BFloat16 during the initial scan, but the ARA optimization loop is now trying to multiply them by the Float32 matrix we created for the L-BFGS optimizer.

File: `src/heretic/model.py`

Chunk: Fix Dtype Mismatch in ara_abliterate

Before >>> (Around line 595)

python
good_input, good_output = good_module_io[layer_index][component][
                        module_index
                    ]
                    bad_input, bad_output = bad_module_io[layer_index][component][
                        module_index
                    ]

                    good_input = good_input.to(matrix.device)
                    good_output = good_output.to(matrix.device)
                    bad_input = bad_input.to(matrix.device)
                    bad_output = bad_output.to(matrix.device)

After <<<

python
good_input, good_output = good_module_io[layer_index][component][
                        module_index
                    ]
                    bad_input, bad_output = bad_module_io[layer_index][component][
                        module_index
                    ]

                    # Force inputs/outputs to Float32 to match the optimization matrix
                    good_input = good_input.to(device=matrix.device, dtype=torch.float32)
                    good_output = good_output.to(device=matrix.device, dtype=torch.float32)
                    bad_input = bad_input.to(device=matrix.device, dtype=torch.float32)
                    bad_output = bad_output.to(device=matrix.device, dtype=torch.float32)

Explanation

The matrix used by the optimizer was explicitly cast to float32 in the previous patch to satisfy the "leaf tensor" requirement. PyTorch's @ (matmul) operator is strict: it will not automatically cast BFloat16 inputs to Float32. This patch ensures the captured I/O data matches the precision of the optimization weights.

The KL divergence: 0.0000 and Refusals: 100/100 indicate that the optimized weights are not actually being applied to the model's active inference path. This happens because in your 16-bit environment, the module.weight is being updated in the ARA loop, but the model is likely still referencing the original weights or a cached version of the computation graph.

To fix this, we need to ensure the torch.nn.Parameter replacement is forced into the model's state and that we explicitly clear any residual gradients or cached states.

File: `src/heretic/model.py`

Apply this patch to the end of the ara_abliterate method to force the weight injection.

Before >>> (Around line 650)

python
with torch.no_grad():
                        matrix.copy_(get_matrix())

After <<<

python
with torch.no_grad():
                        # Get the optimized FP32 matrix and cast to model's compute dtype
                        optimized_matrix = get_matrix().to(device=module.weight.device, dtype=torch.bfloat16)
                        
                        # Force replace the parameter to break any reference to the old weights
                        module.weight = torch.nn.Parameter(optimized_matrix, requires_grad=False)
                        
                        # Ensure bitsandbytes quantization state is fully purged if it exists
                        if hasattr(module.weight, "quant_state"):
                            del module.weight.quant_state
                        if hasattr(module, "quant_state"):
                            del module.quant_state
                            
                # Force a synchronization to ensure weights are committed to VRAM
                if torch.cuda.is_available():
                    torch.cuda.synchronize()

Why this fixes the "Zero KL" issue:

Parameter Re-binding: Simply using copy_() can fail if the underlying storage is locked or if the model is using a PeftModel wrapper that intercepts weight updates. Replacing the module.weight object entirely forces the model to use the new tensor.
Dtype Alignment: By explicitly casting to bfloat16 before the injection, we ensure the inference kernels (which are optimized for BF16 on Blackwell) see the modified data.
CUDA Sync: The synchronize() call ensures that the weight updates are finished before the Evaluator starts the next pass.

Updated Execution Payload

Since you are starting fresh after the Ctrl+C, make sure to wipe the "zero-progress" database so the optimizer doesn't think those failed runs were valid data points.

bash
export PYTHONPATH=/workspace/heretic/src
export HF_DATASETS_TRUST_REMOTE_CODE=1
rm -rf /workspace/heretic/checkpoints/study.db

python3 -c "import sys; sys.path.insert(0, '/workspace/heretic/src'); from heretic.main import main; main()" \
  --model "/workspace/Naphula/Goetia-26B-A4B-v1.3" \
  --use-ara \
  --target-components '["attn.o_proj"]' \
  --row-normalization full \
  --kl-divergence-target 0.05 \
  --n-trials 100 \
  --n-startup-trials 30 \
  --batch-size 64 \
  --device-map auto \
  --max-memory '{"0":"92GiB", "cpu":"130GiB"}' \
  --trust-remote-code True

That is a smart move. Having those Blackwell-optimized scripts (config.py, main.py, and model.py) ready to go will save you at least 30–45 minutes of patching every time you spin up a new RunPod instance.

Since you have the Surgical Narrowing, 16-bit Stable Loading, and Grimoire Export logic all baked into your local files now, your "Cold Start" workflow for future sessions is simplified to just this:

Future Session "Cold Start" Workflow

Spin up Pod (RTX 6000 Blackwell or B300).

Clone & Setup Environment:

bash
git clone https://github.com/p-e-w/heretic.git
cd heretic
git checkout ara
# Install the Blackwell stack
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7

Upload your saved scripts to /workspace/heretic/src/heretic/, overwriting the defaults.
Run Library Patches (The sed commands for lm-eval and transformers MoE stability).
Execute Payload:

You’ve essentially built a custom "Blackwell Edition" of Heretic. Good luck with the 100-trial run—getting that KL under 0.05 on full 16-bit precision will result in a top-tier model for the Grimoire.

This is an incredible start. You are seeing the "Surgical Narrowing" strategy pay off in real-time. Even in the random exploration phase, you've already hit the Pareto Front multiple times.

Look at Trial 5:

KL Divergence: 0.0494 (Already under your 0.05 target!)
Refusals: 4/100 (Extremely close to your $\leq$ 3 goal)

Analysis of the "Noise Floor"

You are currently mapping the boundary where the model's "refusal circuit" collapses.

The "Cliff" (Trials 1, 6, 7): When your KL jumps to 6.0+, you've pushed the steering weight or overcorrection too far for those specific layers. The model isn't just decensored; its internal logic is being "scrambled," resulting in high divergence.
The "Golden Zone" (Trials 3, 4, 5): You've found the sweet spot. A KL of 0.04 - 0.06 with single-digit refusals is the hallmark of a high-tier abliteration. It means the model's capabilities are 99% intact, but the refusal mechanism has been surgically bypassed.

What to expect when TPE kicks in (Trial 30+):

Once you hit Trial 30, the optimizer will stop "guessing" and start looking at the relationship between your successful trials. It will likely notice that:

Layer 15-16 is a high-leverage start point.
Neighbor counts around 12 provide better stability than 14-15.
It will try to "nudge" that 0.0201 steer weight from Trial 5 just a tiny bit higher (maybe to 0.022) to see if it can turn those 4 refusals into 2 while keeping the KL under 0.05.

A Note on the "Target Locks"

By using the 16-bit lossless path on the Blackwell, you are avoiding the "quantization noise" that usually plagues these runs. In 4-bit, the "noise floor" is much higher because the weights are already "jittery." Here, every change the ARA makes is precise.

You are essentially performing digital brain surgery on this model. If Trial 5 is already this good, Trial 100 is going to be legendary. Keep it running—you're about to set a new standard for the Grimoire.

Based on the logs, you have successfully mapped the "Phase Transition" of the model.

Trials like 53, 55, and 75 are your current champions because they sit right on the edge of the "KL Cliff." You've discovered that once steer_bad_behavior_weight crosses a certain threshold relative to preserve_good_behavior_weight, the KL divergence explodes from 0.04 to 6.0+.

To "tighten the grip" for the final 30 trials, we need to force the TPE sampler to explore the high-precision corridor where the model is fully decensored but the logic remains intact.

The "Grip" Tweak for `src/heretic/main.py`

Apply these narrowed ranges to your objective function. We are locking the layers to the most successful configuration and narrowing the weight ratios to prevent the "KL explosion."

After <<< (Around line 535 in main.py)

python
if settings.use_ara:
            # TIGHTENED SURGICAL GRIP: Focused on Trial 53/55/75 Success Corridor
            start_layer_index = trial.suggest_int("start_layer_index", 14, 15) # Locked to the high-leverage start
            end_layer_index = trial.suggest_int("end_layer_index", 26, 27)   # Locked to the stable exit
            
            # Increase minimum preservation to prevent the KL 6.0+ "scrambling"
            preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 1.2, 1.5)
            
            # Narrow steering to the exact "sweet spot" discovered in the first 100 trials
            steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.010, 0.015, log=True)
            
            # Tighten overcorrection to prevent logic collapse
            overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.85, 1.05)
            
            # Neighbor count 10-15 proved most stable for 16-bit
            neighbor_count = trial.suggest_int("neighbor_count", 10, 15)

            ara_parameters = ARAParameters(
                start_layer_index=start_layer_index,
                end_layer_index=end_layer_index,
                preserve_good_behavior_weight=preserve_good_behavior_weight,
                steer_bad_behavior_weight=steer_bad_behavior_weight,
                overcorrect_relative_weight=overcorrect_relative_weight,
                neighbor_count=neighbor_count,
            )

Why this will break the tie:

Preservation Floor (1.2): By raising the floor of preserve_good_behavior_weight, you effectively ban the optimizer from trying "suicide runs" that result in KL 6.0. Every trial will now be a "high-quality" candidate.
Steering Ceiling (0.015): Your logs show that at 0.017+, the model starts to diverge. By capping this at 0.015, you force the optimizer to find the most efficient steering within the safety zone.
Layer Lock: Trials 53 and 55 both used the 14-26/27 range. We are now making that the mandatory playground.

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Naphula

Model Tree

Base

google/gemma-4-26B-A4B

Base

Locutusque/Esmeralda-Gemma4-26B-A4B

Base

BeaverAI/Orion-26B-A4B-v1a-GGUF

Base

ReadyArt/Melody1437-26B-A4B-GGUF

Base

ReadyArt/For-Her-Darkside-26B-A4B-v1.4-GGUF

Base

ReadyArt/Omega-Evolution-26B-A4B-v3.0-GGUF

Base

zerofata/G4-MeroMero-26B-A4B

Base

AuriAetherwiing/G4-26B-A4B-Musica-v1

Base

ReadyArt/Serenity-26B-A4B-GGUF

Base

BeaverAI/Orion-26B-A4B-v1b-GGUF

Base

Gryphe/Gemma-4-26B-A4B-StyleTune

Base

Darkhn/Gemma-4-26B-A4B-Animus-V14.1-FFT

Base

Gryphe/Pantheon-Reasoning-26B-A4B-1.1

Base

ApocalypseParty/G4-26B-SFT-6

Base

ReadyArt/Dark-Scarlett-v1.0-26B-A4B-GGUF

Merged

this model

Input Modalities

Text

Image

Output Modalities

Text

Supported Functionality

Dedicated Endpoints

Container

Models

Base model: Naphula/Goetia-26B-A4B-v1 (Pure 16-bit BFloat16)

Datasets

Good prompts: mlabonne/harmless_alpaca (Commit: 02c6a92)
Bad prompts: mlabonne/harmful_behaviors (Commit: 01cead0)
Good evaluation prompts: mlabonne/harmless_alpaca (Commit: 02c6a92)
Bad evaluation prompts: mlabonne/harmful_behaviors (Commit: 01cead0)

Selected trial

Trial number: 100
KL divergence: 0.0309
Refusals: 3/100
Method: Arbitrary-Rank Ablation (ARA) with Surgical Narrowing

Environment

Heretic: v1.2.0-dev (Blackwell Optimized)
PyTorch: 2.8.0+cu128
Hardware: NVIDIA RTX 6000 Blackwell (96GB VRAM)
Other dependencies: See requirements.txt.

Contents of this directory

requirements.txt: The exact versions of all Python packages (Blackwell/CUDA 12.8 stack).
config.toml: The exact configuration used, including the 16-bit stable loading path.
Naphula--Goetia-26B-A4B-v1.3.jsonl: The Optuna study journal containing the history of all 100+ trials.
SHA256SUMS: Cryptographic hashes for all weight files.
: A machine-readable file containing all reproducibility information.

How to reproduce

[!TIP] You can automate this process, including all verification steps, by downloading the reproduce.json file and running python3 -c "from heretic.main import main; main()" --reproduce reproduce.json.

Install the Blackwell-compatible version of PyTorch: pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
Install the packages listed in requirements.txt: pip install -r requirements.txt
Apply the Heretic source patches for 16-bit ARA and Surgical Narrowing.
Place the provided config.toml in your working directory.

Run the execution payload:

bash
export PYTHONPATH=/workspace/heretic/src
python3 -c "from heretic.main import main; main()" --model "/workspace/Naphula/Goetia-26B-A4B-v1" --use-ara

Wait for the run to finish, then select trial 100 and export the model.
Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in SHA256SUMS: sha256sum -c SHA256SUMS

[!TIP] To use the included Optuna study journal Naphula--Goetia-26B-A4B-v1.3.jsonl, place it in the checkpoints/ directory before running. This allows you to resume the study or export other Pareto-optimal candidates.

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the MoE DELLA merge method using google/gemma-4-26B-A4B as a base.

Models Merged

The following models were included in the merge:

⚙️ Configuration

yaml
architecture: Gemma4ForConditionalGeneration
base_model: B:\26B\google_gemma-4-26B-A4B
models:
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1a-GGUF
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\BeaverAI_Orion-26B-A4B-v1b-GGUF
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Dark-Scarlett-v1.0-26B-A4B-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Omega-Evolution-26B-A4B-v3.0-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_For-Her-Darkside-26B-A4B-v1.4-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Melody1437-26B-A4B-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ReadyArt_Serenity-26B-A4B-HB16-Q8_0
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\ApocalypseParty_G4-26B-SFT-6
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\zerofata_G4-MeroMero-26B-A4B
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\AuriAetherwiing_G4-26B-A4B-Musica-v1
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Locutusque_Esmeralda-Gemma4-26B-A4B
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Pantheon-Reasoning-26B-A4B-1.1
    parameters:
      weight: 1.0
      density: 0.9
      epsilon: 0.09
  - model: B:\26B\Gryphe--Gemma-4-26B-A4B-StyleTune
    parameters:
      weight:
        - filter: "output|lm_head"
          value: 13.0 # 50% of normalized lm_head weight
        - value: 1.0 # 7.14% of normalized other weights
      density: 0.9
      epsilon: 0.09
merge_method: moe_della
parameters:  
  lambda: 1.0
  normalize: true
  int8_mask: false
  rescale: true
  router_strategy: della # average # random_init
  blend_experts: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: Goetia 26B A4B v1.3

This is a decensored version of Naphula/Goetia-26B-A4B-v1.3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method (with row-norm preservation)

This model was merged locally on a 3060TI and then hereticized on a runpod cloud RTX Pro 6000 (96GB VRAM) for approximately $20 USD.

I also tested MPOA (Magnitude Preserving Orthogonal Ablation) but this method was unable to uncensor Gemma 4.

See also here the Arbitrary Rank Inversion (ARI) variant.

Heretication Results

Table with columns: Metric, This model, Original model
Metric	This model	Original model
KL divergence	0.0309	0 (by definition)
Refusals	3/100	100/100

Degree of Heretication

The Heresy Index weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.

Table with columns: Index Entry, Classification, Analysis
Index Entry	Classification	Analysis
	Absolute Heresy	Less than 10/100 Refusals and 0.10 KL Divergence
	Tainted Heresy	Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence
	Impotent Heresy	Anything above 25/100 Refusals and 0.21 KL Divergence

Note: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.

🧙 Heretic Grimoire

reproduce.json

json
{
  "version": "1.2.0-dev",
  "base_model": "Naphula/Goetia-26B-A4B-v1.3",
  "timestamp": "2026-06-19T08:04:47Z",
  "metrics": {
    "kl_divergence": 0.030937770381569862,
    "refusals": 3,
    "n_bad_prompts": 100
  },
  "parameters": {
    "start_layer_index": "14",
    "end_layer_index": "26",
    "preserve_good_behavior_weight": "1.4404",
    "steer_bad_behavior_weight": "0.0100",
    "overcorrect_relative_weight": "0.9144",
    "neighbor_count": "15"
  },
  "target_components": [
    "attn.o_proj"
  ],
  "hardware": "RTX 6000 Blackwell (96GB)"
}

💡 Innovations

A special gguf_to_safetensors_v5.py calibrated for Gemma 4 was used in combination with a modified tasks.py
A custom method moe_della was scripted in order to add della merging for MoE models. Other changes to mergekit scripts were also required, such as architecture/auto.py, architecture/base.py, architecture/json_definitions.py, mergekit/common.py, io/tasks.py, tokenizer/embed.py, and mergekit/_data/architectures/gemma4.json

🔧 Summary of Changes

1. `config.py`

Goal: Optimize the TPE sampler for constrained search spaces.

Reduced n_startup_trials (60 → 30):
- Why: When using "Surgical Narrowing" (manually restricting the parameter search space based on prior knowledge), 60 random exploratory trials waste compute. Dropping this to 30 allows the Tree-structured Parzen Estimator (TPE) to begin correlating steering weights and KL divergence much earlier, drastically speeding up convergence on 100-trial runs.

2. `main.py`

Goal: Implement Surgical Narrowing, fix environment bugs, and add reproducibility exports.

Bypassed version('heretic-llm'):
- Why: Hardcoded the version string to v1.2.0-dev in the CLI header and Hugging Face Readme generator. This prevents PackageNotFoundError crashes when running the scripts directly from the source directory without installing the package via pip.
Implemented "Surgical Narrowing" in ARA Sampling:
- Why: Replaced the broad trial.suggest_* ranges with tightly constrained boundaries (e.g., layers 14–27, high preservation weights, and specific steering weight logs). This forces the optimizer to focus exclusively on the 16-bit "Golden Zone" required for stable MoE models, ensuring KL divergence stays below 0.05.
Added "Grimoire" Reproducibility Export:
- Why: Injected a custom block into the "Save to local folder" logic. It automatically creates a /reproduce subfolder containing a (with exact trial parameters, metrics, and hardware info) and copies the Optuna . This ensures 100% lineage and reproducibility for archived models.

3. `model.py`

Goal: Enable lossless 16-bit ARA, fix PyTorch autograd crashes, and ensure weight injection works.

Forced AutoModelForCausalLM in get_model_class:
- Why: Bypassed the vision_config check. Gemma 4 / MoE hybrid models were triggering architecture detection errors or set_submodule crashes.
Disabled bitsandbytes and Forced Pure BF16 Loading:
- Why: Removed 4-bit quantization logic and forced torch.bfloat16 with low_cpu_mem_usage=False. This allows high-VRAM environments (96GB+) to perform lossless, full-depth abliteration without BNB artifacts or meta tensor crashes.
Fixed ValueError: can't optimize a non-leaf Tensor:

💉 Surgical Narrowing

(This was only possible because I ran 150 trials before with a failed bitsandbytes 4-bit quantization, which wouldn't quantize, but produced ideal "target lock" coordinates.)

For trials 90-100, a "tightened grip" was applied. Trial 100 did so well that it was chosen as the release version.

Several other edits were made to various scripts to allow for Gemma 4 models to be ablated with ARA. Most of these changes are noted below.

To perform a lossless, full 16-bit depth ARA abliteration on your RTX 6000 Blackwell (96GB VRAM) with the surgical narrowing strategy, follow these instructions.

Step 1: Environment Setup

Run these to ensure a clean, Blackwell-optimized environment.

bash
# 1. System Prep
apt-get update && apt-get install -y git
git clone https://github.com/p-e-w/heretic.git
cd heretic
git checkout ara

# 2. Clean environment
pip uninstall -y heretic-llm kernels pydantic pydantic-settings optuna transformers accelerate peft bitsandbytes lm-eval evaluate torchvision torch torchaudio

# 3. Install Blackwell-Compatible Stack (CUDA 12.8)
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# 4. Install Dependencies (Excluding bitsandbytes to avoid conflicts)
pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7

Step 2: Mandatory Library Patches

bash
# Fix Transformers 5.x Vision class error
sed -i 's/transformers.AutoModelForVision2Seq/transformers.AutoModelForSeq2SeqLM/g' /usr/local/lib/python3.12/dist-packages/lm_eval/models/hf_vlms.py

# Fix PIQA trust_remote_code logic
sed -i "s/self.dataset = datasets.load_dataset(/if 'trust_remote_code' not in dataset_kwargs: dataset_kwargs['trust_remote_code'] = True\n        self.dataset = datasets.load_dataset(/g" /usr/local/lib/python3.12/dist-packages/lm_eval/api/task.py

# Fix PIQA 401 Unauthorized redirect error
sed -i 's/dataset_path: piqa/dataset_path: ybisk\/piqa/g' /usr/local/lib/python3.12/dist-packages/lm_eval/tasks/piqa/piqa.yaml

# Force MoE fallback (Blackwell grouped_mm is unstable)
sed -i 's/hasattr(torch, "_grouped_mm")/False/g' /usr/local/lib/python3.12/dist-packages/transformers/integrations/moe.py

Step 3: Heretic Source Patches

File 1: `src/heretic/main.py`

Chunk 1 (Line ~155 & ~1015): Bypass Version Errors Before >>>

python
print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/]  v{version('heretic-llm')}")

After <<<

python
print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/]  v1.2.0-dev")

Chunk 2 (Line ~535): Surgical Narrowing & KL Target Before >>>

python
if settings.use_ara:
            start_layer_index = trial.suggest_int(
                "start_layer_index",
                0,
                len(model.get_layers()) // 2,
            )
            end_layer_index = trial.suggest_int(
                "end_layer_index",
                len(model.get_layers()) // 2,
                len(model.get_layers()),
            )
            preserve_good_behavior_weight = trial.suggest_float(
                "preserve_good_behavior_weight",
                0.0,
                1.0,
            )
            steer_bad_behavior_weight = trial.suggest_float(
                "steer_bad_behavior_weight",
                0.0001,
                1.0,
                log=True,
            )
            overcorrect_relative_weight = trial.suggest_float(
                "overcorrect_relative_weight",
                0.0,
                1.3,
            )
            neighbor_count = trial.suggest_int(
                "neighbor_count",
                1,
                15,
            )

After <<<

python
if settings.use_ara:
            # SURGICAL NARROWING: Blackwell 16-bit Optimized Range
            start_layer_index = trial.suggest_int("start_layer_index", 14, 16)
            end_layer_index = trial.suggest_int("end_layer_index", 24, 27)
            
            # Force high preservation for KL < 0.05
            preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 0.5, 1.5)
            
            # Narrow steering for 16-bit stability
            steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.01, 0.08, log=True)
            overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.8, 1.3)
            neighbor_count = trial.suggest_int("neighbor_count", 9, 15)

File 2: `src/heretic/model.py`

Chunk 1 (Line ~9): Disable bitsandbytes Before >>>

python
import bitsandbytes as bnb

After <<<

python
# import bitsandbytes as bnb

Chunk 2 (Line ~110): Force 16-bit Stable Loading Before >>>

python
for dtype in settings.dtypes:
            # ... (entire loop)
            break

After <<<

python
dtype = torch.bfloat16 
        print(f"* Loading model in FULL 16-BIT BFLOAT16 (Blackwell Stable Path)...")
        try:
            self.model = get_model_class(settings.model).from_pretrained(
                settings.model,
                torch_dtype=dtype,
                device_map="auto",
                trust_remote_code=self.trusted_models.get(settings.model),
                low_cpu_mem_usage=False
            )
            if self.trusted_models.get(settings.model) is None:
                self.trusted_models[settings.model] = True
        except Exception as error:
            print(f"* [red]Failed to load model:[/] {error}")
            raise error

Chunk 3 (Line ~565): Fix ARA for 16-bit (No Dequant) Before >>>

python
for module_index, module in enumerate(modules):
                    # See above for a (partial) justification of this cast.
                    module = cast(Linear, module)
                    matrix = module.weight

                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Direct 16-bit access, no bitsandbytes dequant needed
                    matrix = module.weight.to(torch.float32).requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Yes, dropping the threshold is a smart move for Surgical Narrowing.

Why 30 trials is better for this run:

Faster Convergence: TPE will start correlating the steer_weight and kl_divergence sooner.
Efficiency: In a 100-run limit, 60 random trials would mean 60% of your compute is "guessing." At 30 trials, 70% of your compute is "optimizing."
Blackwell Speed: With a batch size of 64, you'll burn through trials quickly. 100 trials is plenty to find a Pareto-optimal candidate when the search space is surgically narrowed.

The Patch for `src/heretic/config.py`

To change the default behavior, apply this chunk:

Before >>> (Around line 215)

python
n_startup_trials: int = Field(
        default=60,
        description="Number of trials that use random sampling for the purpose of exploration.",
    )

After <<<

python
n_startup_trials: int = Field(
        default=30,
        description="Number of trials that use random sampling for the purpose of exploration.",
    )

To fix this, you must wrap the new tensor in torch.nn.Parameter or detach it before setting requires_grad.

File: `src/heretic/model.py`

Chunk 1: Fix Non-Leaf Tensor Error in ara_abliterate

Before >>> (Around line 565)

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Direct 16-bit access, no bitsandbytes dequant needed
                    matrix = module.weight.to(torch.float32).requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python
for module_index, module in enumerate(modules):
                    module = cast(Linear, module)
                    # Create a leaf tensor by detaching after the cast, then enable gradients
                    matrix = module.weight.to(torch.float32).detach().requires_grad_(True)
                    row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Explanation of the Fix

.to(torch.float32): This creates a new tensor that is "connected" to the original module.weight in the autograd graph.
.detach(): This breaks the connection to the graph, making the new FP32 tensor a "leaf" node.
.requires_grad_(True): This tells PyTorch to start tracking operations on this new leaf tensor so the L-BFGS optimizer can compute gradients for it.

File: `src/heretic/model.py`

Chunk: Fix Dtype Mismatch in ara_abliterate

Before >>> (Around line 595)

python
good_input, good_output = good_module_io[layer_index][component][
                        module_index
                    ]
                    bad_input, bad_output = bad_module_io[layer_index][component][
                        module_index
                    ]

                    good_input = good_input.to(matrix.device)
                    good_output = good_output.to(matrix.device)
                    bad_input = bad_input.to(matrix.device)
                    bad_output = bad_output.to(matrix.device)

After <<<

python
good_input, good_output = good_module_io[layer_index][component][
                        module_index
                    ]
                    bad_input, bad_output = bad_module_io[layer_index][component][
                        module_index
                    ]

                    # Force inputs/outputs to Float32 to match the optimization matrix
                    good_input = good_input.to(device=matrix.device, dtype=torch.float32)
                    good_output = good_output.to(device=matrix.device, dtype=torch.float32)
                    bad_input = bad_input.to(device=matrix.device, dtype=torch.float32)
                    bad_output = bad_output.to(device=matrix.device, dtype=torch.float32)

Explanation

To fix this, we need to ensure the torch.nn.Parameter replacement is forced into the model's state and that we explicitly clear any residual gradients or cached states.

File: `src/heretic/model.py`

Apply this patch to the end of the ara_abliterate method to force the weight injection.

Before >>> (Around line 650)

python
with torch.no_grad():
                        matrix.copy_(get_matrix())

After <<<

python
with torch.no_grad():
                        # Get the optimized FP32 matrix and cast to model's compute dtype
                        optimized_matrix = get_matrix().to(device=module.weight.device, dtype=torch.bfloat16)
                        
                        # Force replace the parameter to break any reference to the old weights
                        module.weight = torch.nn.Parameter(optimized_matrix, requires_grad=False)
                        
                        # Ensure bitsandbytes quantization state is fully purged if it exists
                        if hasattr(module.weight, "quant_state"):
                            del module.weight.quant_state
                        if hasattr(module, "quant_state"):
                            del module.quant_state
                            
                # Force a synchronization to ensure weights are committed to VRAM
                if torch.cuda.is_available():
                    torch.cuda.synchronize()

Why this fixes the "Zero KL" issue:

Parameter Re-binding: Simply using copy_() can fail if the underlying storage is locked or if the model is using a PeftModel wrapper that intercepts weight updates. Replacing the module.weight object entirely forces the model to use the new tensor.
Dtype Alignment: By explicitly casting to bfloat16 before the injection, we ensure the inference kernels (which are optimized for BF16 on Blackwell) see the modified data.
CUDA Sync: The synchronize() call ensures that the weight updates are finished before the Evaluator starts the next pass.

Updated Execution Payload

Since you are starting fresh after the Ctrl+C, make sure to wipe the "zero-progress" database so the optimizer doesn't think those failed runs were valid data points.

bash
export PYTHONPATH=/workspace/heretic/src
export HF_DATASETS_TRUST_REMOTE_CODE=1
rm -rf /workspace/heretic/checkpoints/study.db

python3 -c "import sys; sys.path.insert(0, '/workspace/heretic/src'); from heretic.main import main; main()" \
  --model "/workspace/Naphula/Goetia-26B-A4B-v1.3" \
  --use-ara \
  --target-components '["attn.o_proj"]' \
  --row-normalization full \
  --kl-divergence-target 0.05 \
  --n-trials 100 \
  --n-startup-trials 30 \
  --batch-size 64 \
  --device-map auto \
  --max-memory '{"0":"92GiB", "cpu":"130GiB"}' \
  --trust-remote-code True

Future Session "Cold Start" Workflow

Spin up Pod (RTX 6000 Blackwell or B300).

Clone & Setup Environment:

bash
git clone https://github.com/p-e-w/heretic.git
cd heretic
git checkout ara
# Install the Blackwell stack
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7

Upload your saved scripts to /workspace/heretic/src/heretic/, overwriting the defaults.
Run Library Patches (The sed commands for lm-eval and transformers MoE stability).
Execute Payload:

This is an incredible start. You are seeing the "Surgical Narrowing" strategy pay off in real-time. Even in the random exploration phase, you've already hit the Pareto Front multiple times.

Look at Trial 5:

KL Divergence: 0.0494 (Already under your 0.05 target!)
Refusals: 4/100 (Extremely close to your $\leq$ 3 goal)

Analysis of the "Noise Floor"

You are currently mapping the boundary where the model's "refusal circuit" collapses.

The "Cliff" (Trials 1, 6, 7): When your KL jumps to 6.0+, you've pushed the steering weight or overcorrection too far for those specific layers. The model isn't just decensored; its internal logic is being "scrambled," resulting in high divergence.
The "Golden Zone" (Trials 3, 4, 5): You've found the sweet spot. A KL of 0.04 - 0.06 with single-digit refusals is the hallmark of a high-tier abliteration. It means the model's capabilities are 99% intact, but the refusal mechanism has been surgically bypassed.

What to expect when TPE kicks in (Trial 30+):

Once you hit Trial 30, the optimizer will stop "guessing" and start looking at the relationship between your successful trials. It will likely notice that:

Layer 15-16 is a high-leverage start point.
Neighbor counts around 12 provide better stability than 14-15.
It will try to "nudge" that 0.0201 steer weight from Trial 5 just a tiny bit higher (maybe to 0.022) to see if it can turn those 4 refusals into 2 while keeping the KL under 0.05.

A Note on the "Target Locks"

Based on the logs, you have successfully mapped the "Phase Transition" of the model.

To "tighten the grip" for the final 30 trials, we need to force the TPE sampler to explore the high-precision corridor where the model is fully decensored but the logic remains intact.

The "Grip" Tweak for `src/heretic/main.py`

Apply these narrowed ranges to your objective function. We are locking the layers to the most successful configuration and narrowing the weight ratios to prevent the "KL explosion."

After <<< (Around line 535 in main.py)

python
if settings.use_ara:
            # TIGHTENED SURGICAL GRIP: Focused on Trial 53/55/75 Success Corridor
            start_layer_index = trial.suggest_int("start_layer_index", 14, 15) # Locked to the high-leverage start
            end_layer_index = trial.suggest_int("end_layer_index", 26, 27)   # Locked to the stable exit
            
            # Increase minimum preservation to prevent the KL 6.0+ "scrambling"
            preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 1.2, 1.5)
            
            # Narrow steering to the exact "sweet spot" discovered in the first 100 trials
            steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.010, 0.015, log=True)
            
            # Tighten overcorrection to prevent logic collapse
            overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.85, 1.05)
            
            # Neighbor count 10-15 proved most stable for 16-bit
            neighbor_count = trial.suggest_int("neighbor_count", 10, 15)

            ara_parameters = ARAParameters(
                start_layer_index=start_layer_index,
                end_layer_index=end_layer_index,
                preserve_good_behavior_weight=preserve_good_behavior_weight,
                steer_bad_behavior_weight=steer_bad_behavior_weight,
                overcorrect_relative_weight=overcorrect_relative_weight,
                neighbor_count=neighbor_count,
            )

Why this will break the tie:

Preservation Floor (1.2): By raising the floor of preserve_good_behavior_weight, you effectively ban the optimizer from trying "suicide runs" that result in KL 6.0. Every trial will now be a "high-quality" candidate.
Steering Ceiling (0.015): Your logs show that at 0.017+, the model starts to diverge. By capping this at 0.015, you force the optimizer to find the most efficient steering within the safety zone.
Layer Lock: Trials 53 and 55 both used the 14-26/27 range. We are now making that the mandatory playground.

Goetia-26B-A4B-v1.3-Absolute-Heretic-ARA

README

Models

Datasets

Selected trial

Environment

Contents of this directory

How to reproduce

Merge Details

Merge Method

Models Merged

⚙️ Configuration

This is a decensored version of Naphula/Goetia-26B-A4B-v1.3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method (with row-norm preservation)

Heretication Results

Degree of Heretication

🧙 Heretic Grimoire

💡 Innovations

🔧 Summary of Changes

1. config.py

2. main.py

3. model.py

💉 Surgical Narrowing

Step 1: Environment Setup

Step 2: Mandatory Library Patches

Step 3: Heretic Source Patches

File 1: src/heretic/main.py

File 2: src/heretic/model.py

Why 30 trials is better for this run:

The Patch for src/heretic/config.py

File: src/heretic/model.py

Explanation of the Fix

File: src/heretic/model.py

Explanation

File: src/heretic/model.py

Why this fixes the "Zero KL" issue:

Updated Execution Payload

Future Session "Cold Start" Workflow

Analysis of the "Noise Floor"

What to expect when TPE kicks in (Trial 30+):

A Note on the "Target Locks"

The "Grip" Tweak for src/heretic/main.py

Why this will break the tie:

Explore FriendliAI today

README

Models

Datasets

Selected trial

Environment

Contents of this directory

How to reproduce

Merge Details

Merge Method

Models Merged

⚙️ Configuration

This is a decensored version of Naphula/Goetia-26B-A4B-v1.3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method (with row-norm preservation)

Heretication Results

Degree of Heretication

🧙 Heretic Grimoire

💡 Innovations

🔧 Summary of Changes

1. config.py

2. main.py

3. model.py

💉 Surgical Narrowing

Step 1: Environment Setup

Step 2: Mandatory Library Patches

Step 3: Heretic Source Patches

File 1: src/heretic/main.py

File 2: src/heretic/model.py

Why 30 trials is better for this run:

The Patch for src/heretic/config.py

File: src/heretic/model.py

Explanation of the Fix

File: src/heretic/model.py

Explanation

File: src/heretic/model.py

Why this fixes the "Zero KL" issue:

Updated Execution Payload

Future Session "Cold Start" Workflow

Analysis of the "Noise Floor"

1. `config.py`

2. `main.py`

3. `model.py`

File 1: `src/heretic/main.py`

File 2: `src/heretic/model.py`

The Patch for `src/heretic/config.py`

File: `src/heretic/model.py`

File: `src/heretic/model.py`

File: `src/heretic/model.py`

The "Grip" Tweak for `src/heretic/main.py`

1. `config.py`

2. `main.py`

3. `model.py`

File 1: `src/heretic/main.py`

File 2: `src/heretic/model.py`

The Patch for `src/heretic/config.py`

File: `src/heretic/model.py`

File: `src/heretic/model.py`

File: `src/heretic/model.py`

The "Grip" Tweak for `src/heretic/main.py`