Model Description
MAAT (Multi-phase Adapter-Aware Targeted Unlearning) is a three-phase framework designed to address the challenges of "Why-type" questions—probing causal and relational knowledge—in machine unlearning. Standard unlearning evaluation is often skewed toward simple factual lookups, whereas MAAT focuses on complex reasoning chains and gradient dilution over long answer spans.
The framework operates exclusively on LoRA adapter weights through three distinct phases:
- Phase 1 — Gradient Policy Ascent: Uses orthogonal projection to remove components of the retain gradient from the forget gradient.
- Phase 2 — Structural Compression and Task Negation: Employs SVD rank-dimension pruning to mask forget-scored dimensions and task vector negation.
- Phase 3 — Multi-Objective Utility Repair Engine: Restores model utility on the retain set using a hybrid KL-hidden-state repair mechanism.
This specific checkpoint is targeted at unlearning components of the TOFU dataset while maintaining high retention on general knowledge.
Evaluation
The framework was evaluated using 5WBENCH, a balanced 5,000-sample benchmark with 1,000 examples per 5W category (Who, What, When, Where, Why), allowing for the quantification of causal unlearning failures.
Citation
@article{yagnik2024maat,
title={MAAT: Multi-phase Adapter-Aware Targeted Unlearning},
author={Yagnik, Suryash and Gaur, Shubham and Thakur, Saksham and Jain, Vinija and Chadha, Aman and Das, Amitava},
journal={arXiv preprint arXiv:2605.30514},
year={2024}
}