Dedicated Endpoints

Run this model inference on single tenant GPU with unmatched speed and reliability at scale.

Learn more
Container

Run this model inference with full control and performance in your environment.

Learn more

Get help setting up a custom Dedicated Endpoints.

Talk with our engineer to get a quote for reserved GPU instances with discounts.

README

License: other

Model Description

HIVIS (History-aware Visually grounded), a test-time intervention framework designed to equip CUAs with history state tracking and visually grounded error analysis. Inside our framework, we propose HiVis-critic, a multimodal model to serve as an intervention engine with these dual critique generation capabilities.

Key Highlights:

  • 📝 HiVis-critic for history state tracking: maintains a macro-action history, a compact record of past interactions to date, recursively compressing past interactions into multi-step achieved goals, enabling better history-aware planning of policies over long horizons.
  • 🎯 HiVis-critic for visually grounded error analysis: verifies raw execution coordinates against actual visual states. If a proposed action is flawed, the model identifies the error dimension to provide the policy with corrective guidance before execution.
  • Developed by: Jaewoo Lee, Zaid Khan, Archiki Prasad, Justin Chih-Yao Chen, Supriyo Chakraborty, Kartik Balasubramaniam, Sambit Sahu, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal
  • Model type: Qwen3ForCausalLM, fine-tuned Large Language Model
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: Qwen3-VL-8B-Thinking

Model Sources

Overview of HiVis

Overview of HiVis

Uses

Test-time intervention

The HiVis-critic model serves as an intervention engine with these dual critique generation capabilities to aid policies in long-horizon GUI tasks, enabling precise error analysis before execution and providing a history state tracking that allows better decisions.

Citation

If you find this work useful, please consider citing us:

bibtex

@article{lee2026hisvis,
title={A History-Aware Visually Grounded Critic for Computer Use Agents},
author={Jaewoo Lee and Zaid Khan and Archiki Prasad and Justin Chih-Yao Chen and Supriyo Chakraborty and Kartik Balasubramaniam and Sambit Sahu and Elias Stengel-Eskin and Hyunji Lee and Mohit Bansal},
year={2026},
journal={arXiv preprint arXiv:tbd},
url={https://arxiv.org/abs/tbd},
}

Model provider

Jaew00Lee

Jaew00Lee

Model tree

Base

Qwen/Qwen3-VL-8B-Thinking

Fine-tuned

this model

Modalities

Input

Text, Image

Output

Text

Pricing

Dedicated Endpoints

View details

Supported Functionality

Model APIs

Dedicated Endpoints

Container

More information

Explore FriendliAI today