UI-TARS Desktop
by bytedance
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language. It leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
Last updated: N/A
<p align="center"> <img alt="UI-TARS" width="260" src="./apps/ui-tars/resources/icon.png"> </p>[!IMPORTANT] <a href="./apps/agent-tars/README.md"> <img src="./apps/agent-tars/static/hero.png"> </a>
[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
UI-TARS Desktop
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
<p align="center">    π <a href="https://arxiv.org/abs/2501.12326">Paper</a>    | π€ <a href="https://huggingface.co/bytedance-research/UI-TARS-7B-DPO">Hugging Face Models</a>   |   π«¨ <a href="https://discord.gg/pTXwYVjfcs">Discord</a>   |   π€ <a href="https://www.modelscope.cn/models/bytedance-research/UI-TARS-7B-DPO">ModelScope</a>   <br> π₯οΈ Desktop Application    |    π <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a> </p>Showcases
| Instruction | Video | | :---: | :---: | | Get the current weather in SF using the web browser | <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" /> | | Send a twitter with the content "hello world" | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" /> |
News
- [2025-02-20] - π¦ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
- [2025-01-23] - π We updated the Cloud Deployment section in the δΈζη: GUI樑ει¨η½²ζη¨ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
Features
- π€ Natural language control powered by Vision-Language Model
- π₯οΈ Screenshot and visual recognition support
- π― Precise mouse and keyboard control
- π» Cross-platform support (Windows/MacOS)
- π Real-time feedback and status display
- π Private and secure - fully local processing
Quick Start
See Quick Start.
Deployment
See Deployment.
Contributing
See CONTRIBUTING.md.
SDK (Experimental)
See @ui-tars/sdk
License
UI-TARS Desktop is licensed under the Apache License 2.0.
Citation
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}