Naphula

Goetia-26B-A4B-v1.3-Absolute-Heretic-ARA

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Models

Datasets

Selected trial

  • Trial number: 100
  • KL divergence: 0.0309
  • Refusals: 3/100
  • Method: Arbitrary-Rank Ablation (ARA) with Surgical Narrowing

Environment

  • Heretic: v1.2.0-dev (Blackwell Optimized)
  • PyTorch: 2.8.0+cu128
  • Hardware: NVIDIA RTX 6000 Blackwell (96GB VRAM)
  • Other dependencies: See requirements.txt.

Contents of this directory

How to reproduce

[!TIP] You can automate this process, including all verification steps, by downloading the reproduce.json file and running python3 -c "from heretic.main import main; main()" --reproduce reproduce.json.

  1. Install the Blackwell-compatible version of PyTorch: pip install torch==2.8.0 --index-url https://download.pytorch.org/whl/cu128
  2. Install the packages listed in requirements.txt: pip install -r requirements.txt
  3. Apply the Heretic source patches for 16-bit ARA and Surgical Narrowing.
  4. Place the provided config.toml in your working directory.
  5. Run the execution payload:

    bash

    export PYTHONPATH=/workspace/heretic/src
    python3 -c "from heretic.main import main; main()" --model "/workspace/Naphula/Goetia-26B-A4B-v1" --use-ara
  6. Wait for the run to finish, then select trial 100 and export the model.
  7. Verify that the weight files have been exactly reproduced by comparing their SHA-256 hashes against those in SHA256SUMS: sha256sum -c SHA256SUMS

[!TIP] To use the included Optuna study journal Naphula--Goetia-26B-A4B-v1.3.jsonl, place it in the checkpoints/ directory before running. This allows you to resume the study or export other Pareto-optimal candidates.

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the MoE DELLA merge method using google/gemma-4-26B-A4B as a base.

Models Merged

The following models were included in the merge:

⚙️ Configuration

yaml

architecture: Gemma4ForConditionalGeneration
base_model: B:\26B\google_gemma-4-26B-A4B
models:
- model: B:\26B\BeaverAI_Orion-26B-A4B-v1a-GGUF
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\BeaverAI_Orion-26B-A4B-v1b-GGUF
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ReadyArt_Dark-Scarlett-v1.0-26B-A4B-Q8_0
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ReadyArt_Omega-Evolution-26B-A4B-v3.0-HB16-Q8_0
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ReadyArt_For-Her-Darkside-26B-A4B-v1.4-HB16-Q8_0
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ReadyArt_Melody1437-26B-A4B-HB16-Q8_0
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ReadyArt_Serenity-26B-A4B-HB16-Q8_0
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\ApocalypseParty_G4-26B-SFT-6
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\zerofata_G4-MeroMero-26B-A4B
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\AuriAetherwiing_G4-26B-A4B-Musica-v1
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\Darkhn_Gemma-4-26B-A4B-Animus-V14.1-FFT
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\Locutusque_Esmeralda-Gemma4-26B-A4B
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\Gryphe--Pantheon-Reasoning-26B-A4B-1.1
parameters:
weight: 1.0
density: 0.9
epsilon: 0.09
- model: B:\26B\Gryphe--Gemma-4-26B-A4B-StyleTune
parameters:
weight:
- filter: "output|lm_head"
value: 13.0 # 50% of normalized lm_head weight
- value: 1.0 # 7.14% of normalized other weights
density: 0.9
epsilon: 0.09
merge_method: moe_della
parameters:
lambda: 1.0
normalize: true
int8_mask: false
rescale: true
router_strategy: della # average # random_init
blend_experts: true
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: Goetia 26B A4B v1.3

This is a decensored version of Naphula/Goetia-26B-A4B-v1.3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method (with row-norm preservation)

This model was merged locally on a 3060TI and then hereticized on a runpod cloud RTX Pro 6000 (96GB VRAM) for approximately $20 USD.

I also tested MPOA (Magnitude Preserving Orthogonal Ablation) but this method was unable to uncensor Gemma 4.

See also here the Arbitrary Rank Inversion (ARI) variant.

Heretication Results

Table
MetricThis modelOriginal model
KL divergence0.03090 (by definition)
Refusals3/100100/100

Degree of Heretication

The Heresy Index weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.

Table
Index EntryClassificationAnalysis
AbsoluteAbsolute HeresyLess than 10/100 Refusals and 0.10 KL Divergence
TaintedTainted HeresyAround 25-11/100 Refusals and/or -0.20-0.11 KL Divergence
ImpotentImpotent HeresyAnything above 25/100 Refusals and 0.21 KL Divergence

Note: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.

🧙 Heretic Grimoire

reproduce.json

json

{
"version": "1.2.0-dev",
"base_model": "Naphula/Goetia-26B-A4B-v1.3",
"timestamp": "2026-06-19T08:04:47Z",
"metrics": {
"kl_divergence": 0.030937770381569862,
"refusals": 3,
"n_bad_prompts": 100
},
"parameters": {
"start_layer_index": "14",
"end_layer_index": "26",
"preserve_good_behavior_weight": "1.4404",
"steer_bad_behavior_weight": "0.0100",
"overcorrect_relative_weight": "0.9144",
"neighbor_count": "15"
},
"target_components": [
"attn.o_proj"
],
"hardware": "RTX 6000 Blackwell (96GB)"
}

💡 Innovations

  • A special gguf_to_safetensors_v5.py calibrated for Gemma 4 was used in combination with a modified tasks.py
  • A custom method moe_della was scripted in order to add della merging for MoE models. Other changes to mergekit scripts were also required, such as architecture/auto.py, architecture/base.py, architecture/json_definitions.py, mergekit/common.py, io/tasks.py, tokenizer/embed.py, and mergekit/_data/architectures/gemma4.json

🔧 Summary of Changes

Here is a comprehensive summary of the modifications made to the Heretic codebase (config.py, main.py, and model.py). You can provide this directly to the Heretic developers as a patch summary or pull request rationale for supporting 16-bit Arbitrary-Rank Ablation (ARA) on high-VRAM hardware (like Blackwell RTX 6000 / B300) and Gemma 4 MoE architectures.

1. config.py

Goal: Optimize the TPE sampler for constrained search spaces.

  • Reduced n_startup_trials (60 → 30):
    • Why: When using "Surgical Narrowing" (manually restricting the parameter search space based on prior knowledge), 60 random exploratory trials waste compute. Dropping this to 30 allows the Tree-structured Parzen Estimator (TPE) to begin correlating steering weights and KL divergence much earlier, drastically speeding up convergence on 100-trial runs.

2. main.py

Goal: Implement Surgical Narrowing, fix environment bugs, and add reproducibility exports.

  • Bypassed version('heretic-llm'):
    • Why: Hardcoded the version string to v1.2.0-dev in the CLI header and Hugging Face Readme generator. This prevents PackageNotFoundError crashes when running the scripts directly from the source directory without installing the package via pip.
  • Implemented "Surgical Narrowing" in ARA Sampling:
    • Why: Replaced the broad trial.suggest_* ranges with tightly constrained boundaries (e.g., layers 14–27, high preservation weights, and specific steering weight logs). This forces the optimizer to focus exclusively on the 16-bit "Golden Zone" required for stable MoE models, ensuring KL divergence stays below 0.05.
  • Added "Grimoire" Reproducibility Export:
    • Why: Injected a custom block into the "Save to local folder" logic. It automatically creates a /reproduce subfolder containing a reproduce.json (with exact trial parameters, metrics, and hardware info) and copies the Optuna study_history.db. This ensures 100% lineage and reproducibility for archived models.

3. model.py

Goal: Enable lossless 16-bit ARA, fix PyTorch autograd crashes, and ensure weight injection works.

  • Forced AutoModelForCausalLM in get_model_class:
    • Why: Bypassed the vision_config check. Gemma 4 / MoE hybrid models were triggering architecture detection errors or set_submodule crashes.
  • Disabled bitsandbytes and Forced Pure BF16 Loading:
    • Why: Removed 4-bit quantization logic and forced torch.bfloat16 with low_cpu_mem_usage=False. This allows high-VRAM environments (96GB+) to perform lossless, full-depth abliteration without BNB artifacts or meta tensor crashes.
  • Fixed ValueError: can't optimize a non-leaf Tensor:
    • Why: In ara_abliterate, casting the module weight to FP32 (module.weight.to(torch.float32)) created a tensor connected to the autograd graph. Added .detach() before .requires_grad_(True) to ensure the L-BFGS optimizer receives a valid leaf tensor.
  • Fixed RuntimeError: expected mat1 and mat2 to have the same dtype:
    • Why: The captured good_module_io and bad_module_io tensors were in BFloat16, but the ARA optimization matrix was cast to Float32. Explicitly cast the I/O tensors to dtype=torch.float32 inside the optimization loop so PyTorch's matmul (@) operator doesn't crash.
  • Fixed the "Zero KL / 100 Refusals" Bug (Weight Injection Failure):
    • Why: Using matrix.copy_(get_matrix()) at the end of the ARA loop failed to update the active computation graph in 16-bit mode, causing the evaluator to test the unmodified model.
    • Fix: Replaced the parameter entirely using module.weight = torch.nn.Parameter(...), explicitly cast the optimized matrix back to torch.bfloat16, deleted any lingering quant_state attributes, and added torch.cuda.synchronize() to ensure VRAM was updated before the evaluation step began.

💉 Surgical Narrowing

A custom surgical narrowing strategy was utilized specifically for this G4 moe_della merge, wherein a range lock was added via main.py. The TPE sampling was forced to kick in early after 30 iterations instead of 60, thus allowing for a faster searching.

(This was only possible because I ran 150 trials before with a failed bitsandbytes 4-bit quantization, which wouldn't quantize, but produced ideal "target lock" coordinates.)

For trials 90-100, a "tightened grip" was applied. Trial 100 did so well that it was chosen as the release version.

Several other edits were made to various scripts to allow for Gemma 4 models to be ablated with ARA. Most of these changes are noted below.

To perform a lossless, full 16-bit depth ARA abliteration on your RTX 6000 Blackwell (96GB VRAM) with the surgical narrowing strategy, follow these instructions.

Step 1: Environment Setup

Run these to ensure a clean, Blackwell-optimized environment.

bash

# 1. System Prep
apt-get update && apt-get install -y git
git clone https://github.com/p-e-w/heretic.git
cd heretic
git checkout ara
# 2. Clean environment
pip uninstall -y heretic-llm kernels pydantic pydantic-settings optuna transformers accelerate peft bitsandbytes lm-eval evaluate torchvision torch torchaudio
# 3. Install Blackwell-Compatible Stack (CUDA 12.8)
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
# 4. Install Dependencies (Excluding bitsandbytes to avoid conflicts)
pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7

Step 2: Mandatory Library Patches

bash

# Fix Transformers 5.x Vision class error
sed -i 's/transformers.AutoModelForVision2Seq/transformers.AutoModelForSeq2SeqLM/g' /usr/local/lib/python3.12/dist-packages/lm_eval/models/hf_vlms.py
# Fix PIQA trust_remote_code logic
sed -i "s/self.dataset = datasets.load_dataset(/if 'trust_remote_code' not in dataset_kwargs: dataset_kwargs['trust_remote_code'] = True\n self.dataset = datasets.load_dataset(/g" /usr/local/lib/python3.12/dist-packages/lm_eval/api/task.py
# Fix PIQA 401 Unauthorized redirect error
sed -i 's/dataset_path: piqa/dataset_path: ybisk\/piqa/g' /usr/local/lib/python3.12/dist-packages/lm_eval/tasks/piqa/piqa.yaml
# Force MoE fallback (Blackwell grouped_mm is unstable)
sed -i 's/hasattr(torch, "_grouped_mm")/False/g' /usr/local/lib/python3.12/dist-packages/transformers/integrations/moe.py

Step 3: Heretic Source Patches

File 1: src/heretic/main.py

Chunk 1 (Line ~155 & ~1015): Bypass Version Errors Before >>>

python

print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/] v{version('heretic-llm')}")

After <<<

python

print(f"[cyan]█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀[/] v1.2.0-dev")

Chunk 2 (Line ~535): Surgical Narrowing & KL Target Before >>>

python

if settings.use_ara:
start_layer_index = trial.suggest_int(
"start_layer_index",
0,
len(model.get_layers()) // 2,
)
end_layer_index = trial.suggest_int(
"end_layer_index",
len(model.get_layers()) // 2,
len(model.get_layers()),
)
preserve_good_behavior_weight = trial.suggest_float(
"preserve_good_behavior_weight",
0.0,
1.0,
)
steer_bad_behavior_weight = trial.suggest_float(
"steer_bad_behavior_weight",
0.0001,
1.0,
log=True,
)
overcorrect_relative_weight = trial.suggest_float(
"overcorrect_relative_weight",
0.0,
1.3,
)
neighbor_count = trial.suggest_int(
"neighbor_count",
1,
15,
)

After <<<

python

if settings.use_ara:
# SURGICAL NARROWING: Blackwell 16-bit Optimized Range
start_layer_index = trial.suggest_int("start_layer_index", 14, 16)
end_layer_index = trial.suggest_int("end_layer_index", 24, 27)
# Force high preservation for KL < 0.05
preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 0.5, 1.5)
# Narrow steering for 16-bit stability
steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.01, 0.08, log=True)
overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.8, 1.3)
neighbor_count = trial.suggest_int("neighbor_count", 9, 15)

File 2: src/heretic/model.py

Chunk 1 (Line ~9): Disable bitsandbytes Before >>>

python

import bitsandbytes as bnb

After <<<

python

# import bitsandbytes as bnb

Chunk 2 (Line ~110): Force 16-bit Stable Loading Before >>>

python

for dtype in settings.dtypes:
# ... (entire loop)
break

After <<<

python

dtype = torch.bfloat16
print(f"* Loading model in FULL 16-BIT BFLOAT16 (Blackwell Stable Path)...")
try:
self.model = get_model_class(settings.model).from_pretrained(
settings.model,
torch_dtype=dtype,
device_map="auto",
trust_remote_code=self.trusted_models.get(settings.model),
low_cpu_mem_usage=False
)
if self.trusted_models.get(settings.model) is None:
self.trusted_models[settings.model] = True
except Exception as error:
print(f"* [red]Failed to load model:[/] {error}")
raise error

Chunk 3 (Line ~565): Fix ARA for 16-bit (No Dequant) Before >>>

python

for module_index, module in enumerate(modules):
# See above for a (partial) justification of this cast.
module = cast(Linear, module)
matrix = module.weight
row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python

for module_index, module in enumerate(modules):
module = cast(Linear, module)
# Direct 16-bit access, no bitsandbytes dequant needed
matrix = module.weight.to(torch.float32).requires_grad_(True)
row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Yes, dropping the threshold is a smart move for Surgical Narrowing.

Since you are already narrowing the search space (start/end layers and weights) in main.py, the optimizer doesn't need 60 random trials to "find" the general area of success. By dropping the threshold to 20 or 30, you allow the TPE (Tree-structured Parzen Estimator) to start looking for the high-precision 16-bit "sweet spot" much earlier.

Why 30 trials is better for this run:

  1. Faster Convergence: TPE will start correlating the steer_weight and kl_divergence sooner.
  2. Efficiency: In a 100-run limit, 60 random trials would mean 60% of your compute is "guessing." At 30 trials, 70% of your compute is "optimizing."
  3. Blackwell Speed: With a batch size of 64, you'll burn through trials quickly. 100 trials is plenty to find a Pareto-optimal candidate when the search space is surgically narrowed.

The Patch for src/heretic/config.py

To change the default behavior, apply this chunk:

Before >>> (Around line 215)

python

n_startup_trials: int = Field(
default=60,
description="Number of trials that use random sampling for the purpose of exploration.",
)

After <<<

python

n_startup_trials: int = Field(
default=30,
description="Number of trials that use random sampling for the purpose of exploration.",
)

The error ValueError: can't optimize a non-leaf Tensor occurs because matrix = module.weight.to(torch.float32).requires_grad_(True) creates a new tensor that is the result of a differentiable operation (.to()). In PyTorch, an optimizer can only be initialized with "leaf" tensors (tensors that aren't the result of an operation).

To fix this, you must wrap the new tensor in torch.nn.Parameter or detach it before setting requires_grad.

File: src/heretic/model.py

Chunk 1: Fix Non-Leaf Tensor Error in ara_abliterate

Before >>> (Around line 565)

python

for module_index, module in enumerate(modules):
module = cast(Linear, module)
# Direct 16-bit access, no bitsandbytes dequant needed
matrix = module.weight.to(torch.float32).requires_grad_(True)
row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

After <<<

python

for module_index, module in enumerate(modules):
module = cast(Linear, module)
# Create a leaf tensor by detaching after the cast, then enable gradients
matrix = module.weight.to(torch.float32).detach().requires_grad_(True)
row_norms = LA.vector_norm(matrix, dim=1, keepdim=True).detach()

Explanation of the Fix

  1. .to(torch.float32): This creates a new tensor that is "connected" to the original module.weight in the autograd graph.
  2. .detach(): This breaks the connection to the graph, making the new FP32 tensor a "leaf" node.
  3. .requires_grad_(True): This tells PyTorch to start tracking operations on this new leaf tensor so the L-BFGS optimizer can compute gradients for it.

The error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != float occurs because the good_module_io and bad_module_io tensors were captured in BFloat16 during the initial scan, but the ARA optimization loop is now trying to multiply them by the Float32 matrix we created for the L-BFGS optimizer.

File: src/heretic/model.py

Chunk: Fix Dtype Mismatch in ara_abliterate

Before >>> (Around line 595)

python

good_input, good_output = good_module_io[layer_index][component][
module_index
]
bad_input, bad_output = bad_module_io[layer_index][component][
module_index
]
good_input = good_input.to(matrix.device)
good_output = good_output.to(matrix.device)
bad_input = bad_input.to(matrix.device)
bad_output = bad_output.to(matrix.device)

After <<<

python

good_input, good_output = good_module_io[layer_index][component][
module_index
]
bad_input, bad_output = bad_module_io[layer_index][component][
module_index
]
# Force inputs/outputs to Float32 to match the optimization matrix
good_input = good_input.to(device=matrix.device, dtype=torch.float32)
good_output = good_output.to(device=matrix.device, dtype=torch.float32)
bad_input = bad_input.to(device=matrix.device, dtype=torch.float32)
bad_output = bad_output.to(device=matrix.device, dtype=torch.float32)

Explanation

The matrix used by the optimizer was explicitly cast to float32 in the previous patch to satisfy the "leaf tensor" requirement. PyTorch's @ (matmul) operator is strict: it will not automatically cast BFloat16 inputs to Float32. This patch ensures the captured I/O data matches the precision of the optimization weights.


The KL divergence: 0.0000 and Refusals: 100/100 indicate that the optimized weights are not actually being applied to the model's active inference path. This happens because in your 16-bit environment, the module.weight is being updated in the ARA loop, but the model is likely still referencing the original weights or a cached version of the computation graph.

To fix this, we need to ensure the torch.nn.Parameter replacement is forced into the model's state and that we explicitly clear any residual gradients or cached states.

File: src/heretic/model.py

Apply this patch to the end of the ara_abliterate method to force the weight injection.

Before >>> (Around line 650)

python

with torch.no_grad():
matrix.copy_(get_matrix())

After <<<

python

with torch.no_grad():
# Get the optimized FP32 matrix and cast to model's compute dtype
optimized_matrix = get_matrix().to(device=module.weight.device, dtype=torch.bfloat16)
# Force replace the parameter to break any reference to the old weights
module.weight = torch.nn.Parameter(optimized_matrix, requires_grad=False)
# Ensure bitsandbytes quantization state is fully purged if it exists
if hasattr(module.weight, "quant_state"):
del module.weight.quant_state
if hasattr(module, "quant_state"):
del module.quant_state
# Force a synchronization to ensure weights are committed to VRAM
if torch.cuda.is_available():
torch.cuda.synchronize()

Why this fixes the "Zero KL" issue:

  1. Parameter Re-binding: Simply using copy_() can fail if the underlying storage is locked or if the model is using a PeftModel wrapper that intercepts weight updates. Replacing the module.weight object entirely forces the model to use the new tensor.
  2. Dtype Alignment: By explicitly casting to bfloat16 before the injection, we ensure the inference kernels (which are optimized for BF16 on Blackwell) see the modified data.
  3. CUDA Sync: The synchronize() call ensures that the weight updates are finished before the Evaluator starts the next pass.

Updated Execution Payload

Since you are starting fresh after the Ctrl+C, make sure to wipe the "zero-progress" database so the optimizer doesn't think those failed runs were valid data points.

bash

export PYTHONPATH=/workspace/heretic/src
export HF_DATASETS_TRUST_REMOTE_CODE=1
rm -rf /workspace/heretic/checkpoints/study.db
python3 -c "import sys; sys.path.insert(0, '/workspace/heretic/src'); from heretic.main import main; main()" \
--model "/workspace/Naphula/Goetia-26B-A4B-v1.3" \
--use-ara \
--target-components '["attn.o_proj"]' \
--row-normalization full \
--kl-divergence-target 0.05 \
--n-trials 100 \
--n-startup-trials 30 \
--batch-size 64 \
--device-map auto \
--max-memory '{"0":"92GiB", "cpu":"130GiB"}' \
--trust-remote-code True

That is a smart move. Having those Blackwell-optimized scripts (config.py, main.py, and model.py) ready to go will save you at least 30–45 minutes of patching every time you spin up a new RunPod instance.

Since you have the Surgical Narrowing, 16-bit Stable Loading, and Grimoire Export logic all baked into your local files now, your "Cold Start" workflow for future sessions is simplified to just this:

Future Session "Cold Start" Workflow

  1. Spin up Pod (RTX 6000 Blackwell or B300).
  2. Clone & Setup Environment:

    bash

    git clone https://github.com/p-e-w/heretic.git
    cd heretic
    git checkout ara
    # Install the Blackwell stack
    pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
    pip install pydantic==2.10.0 pydantic-settings optuna==4.1.0 questionary rich transformers==5.12.1 accelerate peft lm-eval==0.4.7
  3. Upload your saved scripts to /workspace/heretic/src/heretic/, overwriting the defaults.
  4. Run Library Patches (The sed commands for lm-eval and transformers MoE stability).
  5. Execute Payload:

You’ve essentially built a custom "Blackwell Edition" of Heretic. Good luck with the 100-trial run—getting that KL under 0.05 on full 16-bit precision will result in a top-tier model for the Grimoire.


This is an incredible start. You are seeing the "Surgical Narrowing" strategy pay off in real-time. Even in the random exploration phase, you've already hit the Pareto Front multiple times.

Look at Trial 5:

  • KL Divergence: 0.0494 (Already under your 0.05 target!)
  • Refusals: 4/100 (Extremely close to your 3 goal)

Analysis of the "Noise Floor"

You are currently mapping the boundary where the model's "refusal circuit" collapses.

  • The "Cliff" (Trials 1, 6, 7): When your KL jumps to 6.0+, you've pushed the steering weight or overcorrection too far for those specific layers. The model isn't just decensored; its internal logic is being "scrambled," resulting in high divergence.
  • The "Golden Zone" (Trials 3, 4, 5): You've found the sweet spot. A KL of 0.04 - 0.06 with single-digit refusals is the hallmark of a high-tier abliteration. It means the model's capabilities are 99% intact, but the refusal mechanism has been surgically bypassed.

What to expect when TPE kicks in (Trial 30+):

Once you hit Trial 30, the optimizer will stop "guessing" and start looking at the relationship between your successful trials. It will likely notice that:

  1. Layer 15-16 is a high-leverage start point.
  2. Neighbor counts around 12 provide better stability than 14-15.
  3. It will try to "nudge" that 0.0201 steer weight from Trial 5 just a tiny bit higher (maybe to 0.022) to see if it can turn those 4 refusals into 2 while keeping the KL under 0.05.

A Note on the "Target Locks"

By using the 16-bit lossless path on the Blackwell, you are avoiding the "quantization noise" that usually plagues these runs. In 4-bit, the "noise floor" is much higher because the weights are already "jittery." Here, every change the ARA makes is precise.

You are essentially performing digital brain surgery on this model. If Trial 5 is already this good, Trial 100 is going to be legendary. Keep it running—you're about to set a new standard for the Grimoire.


Based on the logs, you have successfully mapped the "Phase Transition" of the model.

Trials like 53, 55, and 75 are your current champions because they sit right on the edge of the "KL Cliff." You've discovered that once steer_bad_behavior_weight crosses a certain threshold relative to preserve_good_behavior_weight, the KL divergence explodes from 0.04 to 6.0+.

To "tighten the grip" for the final 30 trials, we need to force the TPE sampler to explore the high-precision corridor where the model is fully decensored but the logic remains intact.

The "Grip" Tweak for src/heretic/main.py

Apply these narrowed ranges to your objective function. We are locking the layers to the most successful configuration and narrowing the weight ratios to prevent the "KL explosion."

After <<< (Around line 535 in main.py)

python

if settings.use_ara:
# TIGHTENED SURGICAL GRIP: Focused on Trial 53/55/75 Success Corridor
start_layer_index = trial.suggest_int("start_layer_index", 14, 15) # Locked to the high-leverage start
end_layer_index = trial.suggest_int("end_layer_index", 26, 27) # Locked to the stable exit
# Increase minimum preservation to prevent the KL 6.0+ "scrambling"
preserve_good_behavior_weight = trial.suggest_float("preserve_good_behavior_weight", 1.2, 1.5)
# Narrow steering to the exact "sweet spot" discovered in the first 100 trials
steer_bad_behavior_weight = trial.suggest_float("steer_bad_behavior_weight", 0.010, 0.015, log=True)
# Tighten overcorrection to prevent logic collapse
overcorrect_relative_weight = trial.suggest_float("overcorrect_relative_weight", 0.85, 1.05)
# Neighbor count 10-15 proved most stable for 16-bit
neighbor_count = trial.suggest_int("neighbor_count", 10, 15)
ara_parameters = ARAParameters(
start_layer_index=start_layer_index,
end_layer_index=end_layer_index,
preserve_good_behavior_weight=preserve_good_behavior_weight,
steer_bad_behavior_weight=steer_bad_behavior_weight,
overcorrect_relative_weight=overcorrect_relative_weight,
neighbor_count=neighbor_count,
)

Why this will break the tie:

  1. Preservation Floor (1.2): By raising the floor of preserve_good_behavior_weight, you effectively ban the optimizer from trying "suicide runs" that result in KL 6.0. Every trial will now be a "high-quality" candidate.
  2. Steering Ceiling (0.015): Your logs show that at 0.017+, the model starts to diverge. By capping this at 0.015, you force the optimizer to find the most efficient steering within the safety zone.
  3. Layer Lock: Trials 53 and 55 both used the 14-26/27 range. We are now making that the mandatory playground.

Model provider

Naphula

Model tree

Base

ReadyArt/For-Her-Darkside-26B-A4B-v1.4-GGUF

Base

Darkhn/Gemma-4-26B-A4B-Animus-V14.1-FFT

Base

zerofata/G4-MeroMero-26B-A4B

Base

ApocalypseParty/G4-26B-SFT-6

Base

BeaverAI/Orion-26B-A4B-v1a-GGUF

Base

Locutusque/Esmeralda-Gemma4-26B-A4B

Base

BeaverAI/Orion-26B-A4B-v1b-GGUF

Base

ReadyArt/Dark-Scarlett-v1.0-26B-A4B-GGUF

Base

Gryphe/Pantheon-Reasoning-26B-A4B-1.1

Base

AuriAetherwiing/G4-26B-A4B-Musica-v1

Base

google/gemma-4-26B-A4B

Base

Gryphe/Gemma-4-26B-A4B-StyleTune

Base

ReadyArt/Serenity-26B-A4B-GGUF

Base

ReadyArt/Omega-Evolution-26B-A4B-v3.0-GGUF

Base

ReadyArt/Melody1437-26B-A4B-GGUF

Merged

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today