Venice AI Image Generator MCP Server
by jhacksman
This project implements a Model Context Protocol (MCP) server that integrates with Venice AI for image generation with an approval/regeneration workflow. It enables LLMs to generate images based on text prompts and implements an interactive approval workflow with thumbs up/down feedback.
Last updated: N/A
Venice AI Image Generator MCP Server
This project implements a Model Context Protocol (MCP) server that integrates with Venice AI for image generation with an approval/regeneration workflow.
What is MCP?
The Model Context Protocol (MCP) is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It acts as a "USB-C port for AI applications," allowing LLMs to connect to various data sources and tools in a standardized way.
For more information, visit the official MCP introduction page.
Project Overview
This MCP server provides a bridge between LLMs (like Claude) and Venice AI's image generation capabilities. It enables LLMs to generate images based on text prompts and implements an interactive approval workflow with thumbs up/down feedback.
Key Features
Image Generation with Approval Workflow
The core functionality of this server is to:
- Generate images using Venice AI based on text prompts
- Display the generated image to the user with clickable thumbs up/down icons overlaid directly on the image
- Allow users to approve the image (clicking thumbs up) or request a regeneration (clicking thumbs down)
- Regenerate images with the same parameters if requested
Technical Implementation
The server implements several MCP tools:
- generate_venice_image: Creates an image from a text prompt and returns it with approval options
- approve_image: Marks an image as approved when the user gives a thumbs up
- regenerate_image: Creates a new image with the same parameters when the user gives a thumbs down
- list_available_models: Provides information about available Venice AI models
User Experience
From the user's perspective, the interaction flow is:
- User provides a text prompt to generate an image
- LLM calls the MCP server to generate the image
- LLM displays the image with clickable thumbs up/down icons overlaid directly on the image
- User clicks the thumbs up icon on the image to approve or thumbs down icon to regenerate
- If thumbs down, the process repeats until the user approves an image
Architecture
The server follows the MCP client-server architecture:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ │ │ │ │
│ LLM Host │◄────┤ MCP Server │◄────┤ Venice AI │
│ (e.g. Claude)│ │ │ │ API │
│ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
- LLM Host: The application running the LLM (e.g., Claude)
- MCP Server: Our server that implements the MCP protocol and connects to Venice AI
- Venice AI API: The external service that generates images
Implementation Details
MCP Server Components
The server consists of:
- FastMCP Server: The core server that handles MCP protocol communication
- Venice AI Integration: Code that interfaces with the Venice AI API
- Image Cache: In-memory storage for tracking generated images and their approval status
- Tool Definitions: Functions that LLMs can call to interact with the server
Data Flow
- LLM receives a prompt from the user
- LLM calls the
generate_venice_image
tool with the prompt - Server sends request to Venice AI API
- Venice AI generates the image and returns a URL
- Server caches the image details and returns the URL with approval options
- LLM displays the image and approval options to the user
- User selects thumbs up or thumbs down
- LLM calls either
approve_image
orregenerate_image
based on user selection - If regenerating, the process repeats from step 3
Example Usage
When connected to an LLM like Claude, the interaction would look like:
User: Generate an image of a futuristic city skyline
Claude: I'll generate that image for you using Venice AI.
[Image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]
User: 👎 (Thumbs down)
Claude: Let me generate a new version for you.
[New image of futuristic city skyline with clickable 👍 and 👎 icons overlaid on the image]
User: 👍 (Thumbs up)
Claude: Great! I've saved this approved image for you.
Gemini Integration for Multi-View Generation
After a user approves an image (by clicking the thumbs up icon), the system automatically processes the approved image through Google's Gemini API to generate multiple consistent views of the 3D object:
- The approved Venice AI image is used as input to the Gemini view generation scripts
- Four different views are generated sequentially:
- Front view (0°) - Generated first
- Right view (90°) - Generated after front view completes
- Left view (270°) - Generated after right view completes
- Back view (180°) - Generated after left view completes
- Each view is displayed in a 4-up layout as it becomes available
- Each script waits for the previous script to complete successfully before executing
4-Up View Approval Process
Each of the four generated views has its own thumbs up/down approval system:
- Each view in the 4-up display has thumbs up/down icons overlaid on the image
- If a user selects thumbs down for any specific view:
- The corresponding Python script for that view is run again
- The newly generated image replaces the rejected image in the 4-up display
- This process repeats until the user approves the image with thumbs up
- Each view can be individually approved or regenerated
3D Model Generation
Once all four views are approved:
- The original Venice AI image and the four approved Gemini-generated views are processed using CUDA Multi-View Stereo
- This processing occurs on a dedicated Linux server on the network
- The CUDA Multi-View Stereo system converts the 2D images into a 3D model
This multi-view generation leverages Gemini's object consistency capabilities to create coherent representations of the 3D object from different angles while maintaining the same style, colors, and proportions as the original Venice AI image.
Future Enhancements
Potential future improvements include:
- Persistent Storage: Save approved images to a database
- Image Editing: Allow users to request specific modifications to generated images
- Multiple Image Generation: Generate several variations at once for the user to choose from
- Additional Views: Generate more angles beyond the four cardinal directions
Venice AI Integration
The server integrates with Venice AI's image generation API, which provides high-quality image generation capabilities. The API allows for:
- Generating images from text prompts
- Customizing image dimensions
- Adjusting generation parameters
- Using different models for different styles
Getting Started
To implement this server, you would need to:
- Install the FastMCP library
- Set up Venice AI API credentials
- Implement the MCP tools as described
- Run the server and connect it to an LLM host
MCP Resources
For more information about the Model Context Protocol and how to build MCP servers, check out these resources:
- MCP Introduction - Official introduction to the Model Context Protocol
- MCP SDKs - Official SDKs for Python, TypeScript, Java, and Kotlin
- MCP GitHub Repository - Official MCP implementation and examples
- Building MCP with LLMs - Tutorial on using LLMs to build MCP servers
- Example Servers - Gallery of official MCP server implementations
- MCP Inspector - Interactive debugging tool for MCP servers