Run this model inference on single tenant GPU with unmatched speed and reliability at scale.
Run this model inference with full control and performance in your environment.
Get help setting up a custom Dedicated Endpoints.
Talk with our engineer to get a quote for reserved GPU instances with discounts.
README
License: apache-2.0Introduction
UI-TARS-1.5, an open-source multimodal agent built upon a powerful vision-language model. It is capable of effectively performing diverse tasks within virtual worlds.
Leveraging the foundational architecture introduced in our recent paper, UI-TARS-1.5 integrates advanced reasoning enabled by reinforcement learning. This allows the model to reason through its thoughts before taking action, significantly enhancing its performance and adaptability, particularly in inference-time scaling. Our new 1.5 version achieves state-of-the-art results across a variety of standard benchmarks, demonstrating strong reasoning capabilities and notable improvements over prior models.
Code: https://github.com/bytedance/UI-TARS
Application: https://github.com/bytedance/UI-TARS-desktop
Performance
Online Benchmark Evaluation
| Benchmark type | Benchmark | UI-TARS-1.5 | OpenAI CUA | Claude 3.7 | Previous SOTA |
|---|---|---|---|---|---|
| Computer Use | OSworld (100 steps) | 42.5 | 36.4 | 28 | 38.1 (200 step) |
| Windows Agent Arena (50 steps) | 42.1 | - | - | 29.8 | |
| Browser Use | WebVoyager | 84.8 | 87 | 84.1 | 87 |
| Online-Mind2web | 75.8 | 71 | 62.9 | 71 | |
| Phone Use | Android World | 64.2 | - | - | 59.5 |
Grounding Capability Evaluation
| Benchmark | UI-TARS-1.5 | OpenAI CUA | Claude 3.7 | Previous SOTA |
|---|---|---|---|---|
| ScreensSpot-V2 | 94.2 | 87.9 | 87.6 | 91.6 |
| ScreenSpotPro | 61.6 | 23.4 | 27.7 | 43.6 |
Poki Game
| Model | 2048 | cubinko | energy | free-the-key | Gem-11 | hex-frvr | Infinity-Loop | Maze:Path-of-Light | shapes | snake-solver | wood-blocks-3d | yarn-untangle | laser-maze-puzzle | tiles-master |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI CUA | 31.04 | 0.00 | 32.80 | 0.00 | 46.27 | 92.25 | 23.08 | 35.00 | 52.18 | 42.86 | 2.02 | 44.56 | 80.00 | 78.27 |
| Claude 3.7 | 43.05 | 0.00 | 41.60 | 0.00 | 0.00 | 30.76 | 2.31 | 82.00 | 6.26 | 42.86 | 0.00 | 13.77 | 28.00 | 52.18 |
| UI-TARS-1.5 | 100.00 | 0.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Minecraft
| Task Type | Task Name | VPT | DreamerV3 | Previous SOTA | UI-TARS-1.5 w/o Thought | UI-TARS-1.5 w/ Thought |
|---|---|---|---|---|---|---|
| Mine Blocks | (oak_log) | 0.8 | 1.0 | 1.0 | 1.0 | 1.0 |
| (obsidian) | 0.0 | 0.0 | 0.0 | 0.2 | 0.3 | |
| (white_bed) | 0.0 | 0.0 | 0.1 | 0.4 | 0.6 | |
| 200 Tasks Avg. | 0.06 | 0.03 | 0.32 | 0.35 | 0.42 | |
| Kill Mobs | (mooshroom) | 0.0 | 0.0 | 0.1 | 0.3 | 0.4 |
| (zombie) | 0.4 | 0.1 | 0.6 | 0.7 | 0.9 | |
| (chicken) | 0.1 | 0.0 | 0.4 | 0.5 | 0.6 | |
| 100 Tasks Avg. | 0.04 | 0.03 | 0.18 | 0.25 | 0.31 |
Model Scale Comparison
This table compares performance across different model scales of UI-TARS on the OSworld benchmark.
| Benchmark Type | Benchmark | UI-TARS-72B-DPO | UI-TARS-1.5-7B | UI-TARS-1.5 |
|---|---|---|---|---|
| Computer Use | OSWorld | 24.6 | 27.5 | 42.5 |
| GUI Grounding | ScreenSpotPro | 38.1 | 49.6 | 61.6 |
The released UI-TARS-1.5-7B focuses primarily on enhancing general computer use capabilities and is not specifically optimized for game-based scenarios, where the UI-TARS-1.5 still holds a significant advantage.
What's next
We are providing early research access to our top-performing UI-TARS-1.5 model to facilitate collaborative research. Interested researchers can contact us at TARS@bytedance.com.
Citation
If you find our paper and model useful in your research, feel free to give us a cite.
BibTeX
@article{qin2025ui,title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},journal={arXiv preprint arXiv:2501.12326},year={2025}}
Model provider
yugen0520
Model tree
Base
this model
Modalities
Input
Text, Image
Output
Text
Pricing
Dedicated Endpoints
View detailsSupported Functionality
Model APIs
Dedicated Endpoints
Container
More information