Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: otherModel Description
HIVIS (History-aware Visually grounded), a test-time intervention framework designed to equip CUAs with history state tracking and visually grounded error analysis. Inside our framework, we propose HiVis-critic, a multimodal model to serve as an intervention engine with these dual critique generation capabilities.
Key Highlights:
- 📝 HiVis-critic for history state tracking: maintains a macro-action history, a compact record of past interactions to date, recursively compressing past interactions into multi-step achieved goals, enabling better history-aware planning of policies over long horizons.
- 🎯 HiVis-critic for visually grounded error analysis: verifies raw execution coordinates against actual visual states. If a proposed action is flawed, the model identifies the error dimension to provide the policy with corrective guidance before execution.
- Developed by: Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal
- Model type:
Qwen3ForCausalLM, fine-tuned Large Language Model - Language(s) (NLP): English
- License: MIT
- Finetuned from model: Qwen3-VL-8B-Thinking
Model Sources
- Repository: https://github.com/G-JWLee/HiVis
- Paper: A History-Aware Visually Grounded Critic for Computer Use Agents
Overview of HiVis

Uses
Test-time intervention
The HiVis-critic model serves as an intervention engine with these dual critique generation capabilities to aid policies in long-horizon GUI tasks, enabling precise error analysis before execution and providing a history state tracking that allows better decisions.
Citation
If you find this work useful, please consider citing us:
bibtex
@article{lee2026hisvis,title={A History-Aware Visually Grounded Critic for Computer Use Agents},author={Jaewoo Lee and Zaid Khan and Archiki Prasad and Justin Chih-Yao Chen and Supriyo Chakraborty and Kartik Balasubramaniam and Sambit Sahu and Elias Stengel-Eskin and Hyunji Lee and Mohit Bansal},year={2026},journal={arXiv preprint arXiv:tbd},url={https://arxiv.org/abs/tbd},}
Model provider
Jaew00Lee
Model tree
Base
Qwen/Qwen3-VL-8B-Thinking
Fine-tuned
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information