Model Description
MAAT is designed to address the challenges of causal unlearning (answering "Why"-type questions) while maintaining high retention of other knowledge. The framework operates exclusively on LoRA adapter weights and achieves a new operating point on the forget-retain Pareto frontier by combining:
- Gradient-Projected Ascent: Orthogonally projects forget gradients to remove components that conflict with the retain set.
- Structural Compression and Task Negation: Uses SVD rank-dimension pruning and task vector negation to selectively erase knowledge.
- Multi-Objective Utility Repair Engine: A joint parameter alignment loop over the retain set to recover utility using KL divergence and hidden-state alignment.
Evaluation
The model was evaluated using 5WBENCH, a balanced 5,000-sample benchmark covering Who, What, When, Where, and Why categories, specifically designed to quantify causal unlearning failures.
Citation
@article{yagnik2025maat,
title={MAAT: Multi-phase Adapter-Aware Targeted Unlearning},
author={Yagnik, Suryash and Gaur, Shubham and Thakur, Saksham and Jain, Vinija and Chadha, Aman and Das, Amitava},
journal={arXiv preprint arXiv:2605.30514},
year={2025}
}