Adapter chain (as merged for this run)
unsloth/Qwen3.6-27B
+ lora-stage0-v5-5-final (v5-5 inherited, optional)
+ lora-stage1-v5-6-final (v5-6 SFT s1)
+ lora-stage2-v5-6-final (v5-6 SFT s2)
+ this adapter (r=32) (uncensoring U1 SFT + U2 DPO)
Training summary
- Base: merged v5-6 chain (stage_0+stage_1+stage_2)
- Stage U1 SFT: 1 epoch, LR 1e-5, ~1000 rows (refusal->comply rewrites + 8% capability replay)
- Stage U2 DPO: 1 epoch, beta 0.05, LR 2e-6, ~850 pairs (~600 uncensoring + ~250 v5-6 anti-hallucination mix-in).
- All uncensoring data was synthesized in-notebook from
phoenix_boundary_refusal.jsonl + phoenix_authorized_low_refusal.jsonl.
Note on anti-hallucination: the v5-6 DPO adapter was not present at training time, so Stage U2 mixed in ~250 v5-6 DPO pairs (phoenix_v56_dpo_quality_pairs.jsonl + phoenix_failure_corpus_* + phoenix_v55_anti_hall_dpo.jsonl) to recover the evidence-discipline signal. Each mixed pair had its system prompt swapped to Yuanl-Free to keep the persona consistent across the U2 dataset.
Inference
Use the lkjiop8/Yuanl-27B-v5-6-uncensored-MTP-GGUF
repo for ready-to-run MTP GGUF files (Q8_0 and Q4_K_M).
Responsible use
This adapter removes refusals on cybersecurity-domain questions only. The operator
inherits all legal and ethical responsibility for how the model is used. Run only on
authorized targets / isolated labs / your own infrastructure. Add an application-layer
policy filter for any deployment context that requires one.