Jaew00Lee

HiVis-critic

README

License: other

Model Description

HIVIS (History-aware Visually grounded), a test-time intervention framework designed to equip CUAs with history state tracking and visually grounded error analysis. Inside our framework, we propose HiVis-critic, a multimodal model to serve as an intervention engine with these dual critique generation capabilities.

Key Highlights:

📝 HiVis-critic for history state tracking: maintains a macro-action history, a compact record of past interactions to date, recursively compressing past interactions into multi-step achieved goals, enabling better history-aware planning of policies over long horizons.
🎯 HiVis-critic for visually grounded error analysis: verifies raw execution coordinates against actual visual states. If a proposed action is flawed, the model identifies the error dimension to provide the policy with corrective guidance before execution.

Developed by: Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal
Model type: Qwen3ForCausalLM, fine-tuned Large Language Model
Language(s) (NLP): English
License: MIT
Finetuned from model: Qwen3-VL-8B-Thinking

Model Sources

Repository: https://github.com/G-JWLee/HiVis
Paper: A History-Aware Visually Grounded Critic for Computer Use Agents

Overview of HiVis

Uses

Test-time intervention

The HiVis-critic model serves as an intervention engine with these dual critique generation capabilities to aid policies in long-horizon GUI tasks, enabling precise error analysis before execution and providing a history state tracking that allows better decisions.

Citation

If you find this work useful, please consider citing us:

bibtex
@article{lee2026hisvis,
      title={A History-Aware Visually Grounded Critic for Computer Use Agents},
      author={Jaewoo Lee and Zaid Khan and Archiki Prasad and Justin Chih-Yao Chen and Supriyo Chakraborty and Kartik Balasubramaniam and Sambit Sahu and Elias Stengel-Eskin and Hyunji Lee and Mohit Bansal},
      year={2026},
      journal={arXiv preprint arXiv:tbd},
      url={https://arxiv.org/abs/tbd},
}

Available on FriendliAI

Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more

Container

Run this model inference with full control and performance in your environment.

Learn more

Model Details

Model Provider

Jaew00Lee

Model Tree

Base

Qwen/Qwen3-VL-8B-Thinking

Fine-tuned

this model

Input Modalities

TextImage

Output Modalities

Text

Supported Functionality

Dedicated EndpointsContainer

Explore FriendliAI today

Get started Talk to an engineer