omniparser-autogui-mcp logo

omniparser-autogui-mcp

by NON906

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI. It is confirmed to work on Windows.

View on GitHub

Last updated: N/A

What is omniparser-autogui-mcp?

This is an MCP (Model Context Protocol) server that leverages OmniParser to analyze the screen and automate GUI operations.

How to use omniparser-autogui-mcp?

  1. Clone the repository recursively and navigate to the directory. 2. Use uv sync to install dependencies. 3. Set the OCR_LANG environment variable. 4. Run download_models.py to download necessary models. 5. Add the server configuration to your claude_desktop_config.json file, adjusting the path to the cloned repository. 6. Configure environment variables as needed for specific use cases.

Key features of omniparser-autogui-mcp

  • Screen analysis using OmniParser

  • Automated GUI operation

  • MCP server implementation

  • Configurable target window

  • Support for remote OmniParser server

  • SSE communication option

Use cases of omniparser-autogui-mcp

  • Automating browser tasks (e.g., searching)

  • Interacting with desktop applications

  • Creating automated workflows based on screen content

  • Integrating with other MCP-compatible clients

FAQ from omniparser-autogui-mcp

What is OmniParser?

OmniParser is a tool used for analyzing the screen.

What is MCP?

MCP stands for Model Context Protocol. It is a protocol for communication between applications.

How do I specify a target window?

Set the TARGET_WINDOW_NAME environment variable to the name of the window you want to operate on.

How do I use a remote OmniParser server?

Set the OMNI_PARSER_SERVER environment variable to the address and port of the remote server.

What if it doesn't work with other clients like LibreChat?

Specify 1 for the OMNI_PARSER_BACKEND_LOAD environment variable.