Yingyaeliae

Ministral-3-14B-Nymphaea-RP-heretic

Deploy Dedicated

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: apache-2.0

Abliteration parameters

Table
Parameter	Value
direction_index	17.23
attn.o_proj.max_weight	1.38
attn.o_proj.max_weight_position	24.62
attn.o_proj.min_weight	0.05
attn.o_proj.min_weight_distance	11.04
mlp.down_proj.max_weight	1.49
mlp.down_proj.max_weight_position	27.32
mlp.down_proj.min_weight	0.92
mlp.down_proj.min_weight_distance	18.76

Performance

Table
Metric	This model	Original model (0xA50C1A1/Ministral-3-14B-Nymphaea-RP)
KL divergence	0.0158	0 (by definition)
Refusals	3/100	13/100

Ministral-3-14B-Nymphaea-RP

A fine-tune of Ministral 3 14B Instruct 2512 for roleplay and creative writing.

[!Tip] The SillyTavern preset is available here. For custom presets, please use the Mistral V7-Tekken instruct template.

Tested at Q6_K quantization with the Web Search extension (via SearXNG) in SillyTavern.

SillyTavern Screenshot

GGUF

Here is my custom mixed-quant GGUF, which I use regularly. It fits fine into 16GB VRAM with a 16K context window (using Q8 KV cache). If you need mmproj, it's available here.

markdown
llama-quantize \
  --imatrix imatrix.gguf \
  --token-embedding-type q8_0 \
  --output-tensor-type q8_0 \
  --tensor-type ".*attn_q.weight=q8_0" \
  --tensor-type ".*attn_k.weight=q8_0" \
  --tensor-type ".*attn_output.weight=q5_k" \
  --tensor-type ".*attn_v.weight=iq4_nl" \
  --tensor-type ".*ffn_up.weight=iq4_nl" \
  --tensor-type ".*ffn_gate.weight=iq4_nl" \
  Ministral-3-14B-Nymphaea-RP.F16.gguf \
  Ministral-3-14B-Nymphaea-RP.Q5_Mix.gguf \
  q5_k

Imatrix file for making your own quants is available here. I used this calibration dataset to create it, expanding it with RP and creative writing data (about 400k tokens).

Training Notes

Trained on the latest iteration of my Darkmere dataset. This version features expanded genre variety, built upon a mix of manually curated synthetics and human-written stories.

[!IMPORTANT] The base weights are abliterated via Heretic prior to fine-tuning, so this fine-tune is quite uncensored.

Method:

Training Method: DoRA (Weight-Decomposed LoRA)
Target Modules all-linear
LoRA Rank: 64
LoRA Alpha: 64
LoRA Dropout: 0.05

Hyperparameters:

Batch Size: 2 (Per-device)
Gradient Accumulation: 2
Epochs: 2
Learning Rate: 1e-4
Optimizer: adamw_torch_fused
LR Scheduler: cosine
Noise Level: neftune_noise_alpha=5

[!Note] The vision encoder was frozen during training, so the model retains its native vision capabilities.

Special Thanks

This fine-tune wouldn't be possible without the incredible work of the community:

p-e-w for developing Heretic - an essential tool for censorship removal.
SicariusSicariiStuff for developing SLOP_Detector script.
Mistral AI for their Ministral 3 weights.
AMD for their Instinct™ MI300X GPU.