Model Description
HIVIS (History-aware Visually grounded), a test-time intervention framework designed to equip CUAs with history state tracking and visually grounded error analysis. Inside our framework, we propose HiVis-critic, a multimodal model to serve as an intervention engine with these dual critique generation capabilities.
Key Highlights:
- 📝 HiVis-critic for history state tracking: maintains a macro-action history, a compact record of past interactions to date, recursively compressing past interactions into multi-step achieved goals, enabling better history-aware planning of policies over long horizons.
- 🎯 HiVis-critic for visually grounded error analysis: verifies raw execution coordinates against actual visual states. If a proposed action is flawed, the model identifies the error dimension to provide the policy with corrective guidance before execution.
- Developed by: Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal
- Model type:
Qwen3ForCausalLM, fine-tuned Large Language Model
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: Qwen3-VL-8B-Thinking
Model Sources
Overview of HiVis

Uses
Test-time intervention
The HiVis-critic model serves as an intervention engine with these dual critique generation capabilities to aid policies in long-horizon GUI tasks, enabling precise error analysis before execution and providing a history state tracking that allows better decisions.
Citation
If you find this work useful, please consider citing us:
@article{lee2026hisvis,
title={A History-Aware Visually Grounded Critic for Computer Use Agents},
author={Jaewoo Lee and Zaid Khan and Archiki Prasad and Justin Chih-Yao Chen and Supriyo Chakraborty and Kartik Balasubramaniam and Sambit Sahu and Elias Stengel-Eskin and Hyunji Lee and Mohit Bansal},
year={2026},
journal={arXiv preprint arXiv:tbd},
url={https://arxiv.org/abs/tbd},
}