MCP Image Recognition Server

by mario-andreschak

AI/Computer Vision image recognition anthropic openai vision API OCR

This is an MCP server that provides image recognition capabilities by leveraging Anthropic and OpenAI vision APIs. It supports multiple image formats and offers configurable provider options.

View on GitHub

Last updated: N/A

What is MCP Image Recognition Server?

The MCP Image Recognition Server is a tool that allows you to analyze images and generate detailed descriptions using either Anthropic's Claude Vision or OpenAI's GPT-4 Vision. It can be integrated into MCP systems to provide image understanding capabilities.

How to use MCP Image Recognition Server?

First, clone the repository and configure the .env file with your API keys and desired settings. Then, build the project and run the server using the provided scripts (python -m image_recognition_server.server or run.bat server). You can then use the available tools like describe_image (for Base64 encoded images) or describe_image_from_file (for image files) to get descriptions.

Key features of MCP Image Recognition Server

Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
Support for multiple image formats (JPEG, PNG, GIF, WebP)
Configurable primary and fallback providers
Base64 and file-based image input support
Optional text extraction using Tesseract OCR

Use cases of MCP Image Recognition Server

Automated image tagging and categorization
Content moderation and safety analysis
Image-based search and retrieval
Accessibility improvements through image descriptions

FAQ from MCP Image Recognition Server

What API keys do I need?

You need either an Anthropic API key or an OpenAI API key, depending on which provider you want to use. You may also need an OpenRouter API key if you want to use it.

How do I enable OCR?

Set ENABLE_OCR to true in your .env file and ensure that Tesseract OCR is installed on your system.

What is OpenRouter?

OpenRouter allows you to access various models using the OpenAI API format. You can use it by setting OPENAI_API_KEY and OPENAI_BASE_URL to the OpenRouter values, and VISION_PROVIDER to openai.

What are the default models?

The default models are claude-3.5-sonnet-beta for Anthropic and gpt-4o-mini for OpenAI.

How do I run the tests?

Use the run.bat test command to run all tests, or run.bat test <suite> to run a specific test suite (e.g., run.bat test server).