omniparser-autogui-mcp
by NON906
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI. It is confirmed to work on Windows.
Last updated: N/A
What is omniparser-autogui-mcp?
This is an MCP (Model Context Protocol) server that leverages OmniParser to analyze the screen and automate GUI operations.
How to use omniparser-autogui-mcp?
- Clone the repository recursively and navigate to the directory. 2. Use
uv syncto install dependencies. 3. Set theOCR_LANGenvironment variable. 4. Rundownload_models.pyto download necessary models. 5. Add the server configuration to yourclaude_desktop_config.jsonfile, adjusting the path to the cloned repository. 6. Configure environment variables as needed for specific use cases.
Key features of omniparser-autogui-mcp
Screen analysis using OmniParser
Automated GUI operation
MCP server implementation
Configurable target window
Support for remote OmniParser server
SSE communication option
Use cases of omniparser-autogui-mcp
Automating browser tasks (e.g., searching)
Interacting with desktop applications
Creating automated workflows based on screen content
Integrating with other MCP-compatible clients
FAQ from omniparser-autogui-mcp
What is OmniParser?
What is OmniParser?
OmniParser is a tool used for analyzing the screen.
What is MCP?
What is MCP?
MCP stands for Model Context Protocol. It is a protocol for communication between applications.
How do I specify a target window?
How do I specify a target window?
Set the TARGET_WINDOW_NAME environment variable to the name of the window you want to operate on.
How do I use a remote OmniParser server?
How do I use a remote OmniParser server?
Set the OMNI_PARSER_SERVER environment variable to the address and port of the remote server.
What if it doesn't work with other clients like LibreChat?
What if it doesn't work with other clients like LibreChat?
Specify 1 for the OMNI_PARSER_BACKEND_LOAD environment variable.