Venice AI Image Generator MCP Server

This project implements a Model Context Protocol (MCP) server that integrates with Venice AI for image generation with an approval/regeneration workflow.

What is MCP?

The Model Context Protocol (MCP) is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It acts as a "USB-C port for AI applications," allowing LLMs to connect to various data sources and tools in a standardized way.

For more information, visit the official MCP introduction page.

Project Overview

This MCP server provides a bridge between LLMs (like Claude) and Venice AI's image generation capabilities. It enables LLMs to generate images based on text prompts and implements an interactive approval workflow with thumbs up/down feedback.

Key Features

Image Generation with Approval Workflow

The core functionality of this server is to:

Generate images using Venice AI based on text prompts
Display the generated image to the user with clickable thumbs up/down icons overlaid directly on the image
Allow users to approve the image (clicking thumbs up) or request a regeneration (clicking thumbs down)
Regenerate images with the same parameters if requested

Technical Implementation

The server implements several MCP tools:

generate_venice_image: Creates an image from a text prompt and returns it with approval options
approve_image: Marks an image as approved when the user gives a thumbs up
regenerate_image: Creates a new image with the same parameters when the user gives a thumbs down
list_available_models: Provides information about available Venice AI models

User Experience

From the user's perspective, the interaction flow is:

User provides a text prompt to generate an image
LLM calls the MCP server to generate the image
LLM displays the image with clickable thumbs up/down icons overlaid directly on the image
User clicks the thumbs up icon on the image to approve or thumbs down icon to regenerate
If thumbs down, the process repeats until the user approves an image

Architecture

The server follows the MCP client-server architecture:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│             │     │             │     │             │
│  LLM Host   │◄────┤  MCP Server │◄────┤  Venice AI  │
│ (e.g. Claude)│     │             │     │    API     │
│             │     │             │     │             │
└─────────────┘     └─────────────┘     └─────────────┘

LLM Host: The application running the LLM (e.g., Claude)
MCP Server: Our server that implements the MCP protocol and connects to Venice AI
Venice AI API: The external service that generates images

Implementation Details

MCP Server Components

The server consists of:

FastMCP Server: The core server that handles MCP protocol communication
Venice AI Integration: Code that interfaces with the Venice AI API
Image Cache: In-memory storage for tracking generated images and their approval status
Tool Definitions: Functions that LLMs can call to interact with the server

Data Flow

LLM receives a prompt from the user
LLM calls the generate_venice_image tool with the prompt
Server sends request to Venice AI API
Venice AI generates the image and returns a URL
Server caches the image details and returns the URL with approval options
LLM displays the image and approval options to the user
User selects thumbs up or thumbs down
LLM calls either approve_image or regenerate_image based on user selection
If regenerating, the process repeats from step 3

Example Usage

When connected to an LLM like Claude, the interaction would look like:

User: Generate an image of a futuristic city skyline
Claude: I'll generate that image for you using Venice AI.

[Image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]

User: 👎 (Thumbs down)
Claude: Let me generate a new version for you.

[New image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]

User: 👍 (Thumbs up)
Claude: Great! I've saved this approved image for you.

Gemini Integration for Multi-View Generation

After a user approves an image (by clicking the thumbs up icon), the system automatically processes the approved image through Google's Gemini API to generate multiple consistent views of the 3D object:

The approved Venice AI image is used as input to the Gemini view generation scripts
Four different views are generated sequentially:
- Front view (0°) - Generated first
- Right view (90°) - Generated after front view completes
- Left view (270°) - Generated after right view completes
- Back view (180°) - Generated after left view completes
Each view is displayed in a 4-up layout as it becomes available
Each script waits for the previous script to complete successfully before executing

4-Up View Approval Process

Each of the four generated views has its own thumbs up/down approval system:

Each view in the 4-up display has thumbs up/down icons overlaid on the image
If a user selects thumbs down for any specific view:
- The corresponding Python script for that view is run again
- The newly generated image replaces the rejected image in the 4-up display
- This process repeats until the user approves the image with thumbs up
Each view can be individually approved or regenerated

3D Model Generation

Once all four views are approved:

The original Venice AI image and the four approved Gemini-generated views are processed using CUDA Multi-View Stereo
This processing occurs on a dedicated Linux server on the network
The CUDA Multi-View Stereo system converts the 2D images into a 3D model

This multi-view generation leverages Gemini's object consistency capabilities to create coherent representations of the 3D object from different angles while maintaining the same style, colors, and proportions as the original Venice AI image.

Future Enhancements

Potential future improvements include:

Persistent Storage: Save approved images to a database
Image Editing: Allow users to request specific modifications to generated images
Multiple Image Generation: Generate several variations at once for the user to choose from
Additional Views: Generate more angles beyond the four cardinal directions

Venice AI Integration

The server integrates with Venice AI's image generation API, which provides high-quality image generation capabilities. The API allows for:

Generating images from text prompts
Customizing image dimensions
Adjusting generation parameters
Using different models for different styles

Getting Started

To implement this server, you would need to:

Install the FastMCP library
Set up Venice AI API credentials
Implement the MCP tools as described
Run the server and connect it to an LLM host

MCP Resources

For more information about the Model Context Protocol and how to build MCP servers, check out these resources:

MCP Introduction - Official introduction to the Model Context Protocol
MCP SDKs - Official SDKs for Python, TypeScript, Java, and Kotlin
MCP GitHub Repository - Official MCP implementation and examples
Building MCP with LLMs - Tutorial on using LLMs to build MCP servers
Example Servers - Gallery of official MCP server implementations
MCP Inspector - Interactive debugging tool for MCP servers