JingyuanHuang
GUI-RD-9B
Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Intended Use
This model is intended for GUI grounding research and evaluation. It takes a GUI screenshot and a natural-language instruction, then predicts the target screen coordinate.
Loading
python
from transformers import AutoModelForMultimodalLM, AutoProcessorimport torchmodel_id = "JingyuanHuang/GUI-RD-9B"processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)model = AutoModelForMultimodalLM.from_pretrained(model_id,dtype=torch.bfloat16,device_map="auto",trust_remote_code=True,)
Depending on your installed Transformers version, the concrete auto-model class for Qwen3.5 may differ. For older Transformers releases, use torch_dtype=torch.bfloat16 instead of dtype=torch.bfloat16. The repository provides standard Transformers config, tokenizer, processor, and safetensors weights.
Citation
bibtex
@misc{huang2026trustrightteacherqualityaware,title={Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding},author={Jingyuan Huang and Zuming Huang and Yucheng Shi and Tianze Yang and Xiaoming Zhai and Wei Chu and Ninghao Liu},year={2026},eprint={2606.18101},archivePrefix={arXiv},primaryClass={cs.AI},url={https://arxiv.org/abs/2606.18101},}
Model provider
JingyuanHuang
Model tree
Base
Qwen/Qwen3.5-9B
Fine-tuned
this model
Modalities
Input
Video, Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information