UI-TARS Desktop
by bytedance
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language. It leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
Last updated: N/A
What is UI-TARS Desktop?
UI-TARS Desktop is a desktop application that enables users to control their computer using natural language, powered by a Vision-Language Model. It provides a GUI agent for automating tasks through visual recognition and interaction.
How to use UI-TARS Desktop?
To use UI-TARS Desktop, refer to the Quick Start guide for initial setup and usage instructions. You can control your computer by providing natural language commands, which the application interprets to perform actions on your desktop.
Key features of UI-TARS Desktop
Natural language control powered by Vision-Language Model
Screenshot and visual recognition support
Precise mouse and keyboard control
Cross-platform support (Windows/MacOS)
Real-time feedback and status display
Private and secure - fully local processing
Use cases of UI-TARS Desktop
Automating web browsing tasks
Sending messages via social media
Managing files and folders
Interacting with desktop applications
Performing repetitive tasks with natural language commands
FAQ from UI-TARS Desktop
What platforms does UI-TARS Desktop support?
What platforms does UI-TARS Desktop support?
UI-TARS Desktop supports Windows and MacOS.
Is UI-TARS Desktop secure?
Is UI-TARS Desktop secure?
Yes, UI-TARS Desktop is designed to be private and secure, with fully local processing.
Where can I find the Quick Start guide?
Where can I find the Quick Start guide?
The Quick Start guide can be found in the Quick Start document.
Where can I find the SDK?
Where can I find the SDK?
The SDK can be found in @ui-tars/sdk
How can I contribute to UI-TARS Desktop?
How can I contribute to UI-TARS Desktop?
See CONTRIBUTING.md.