MLC Bakery
by jettyio
MLC Bakery is a Python-based service for managing ML model provenance and lineage. It is built with FastAPI and SQLAlchemy.
Last updated: N/A
MLC Bakery
A Python-based service for managing ML model provenance and lineage, built with FastAPI and SQLAlchemy.
Features
- Dataset management with collection support
- Entity tracking
- Activity logging
- Agent management
- Provenance relationships tracking
- RESTful API endpoints
Development Setup
Prerequisites
- Python 3.12+
- uv (Python package manager)
- PostgreSQL (running locally or via Docker)
Local Development Setup
-
Clone the repository:
git clone <your-repo-url> mlcbakery cd mlcbakery
-
Install Dependencies:
uv
usespyproject.toml
to manage dependencies. It will automatically create a virtual environment if one doesn't exist.# Install main, dev, and webclient dependencies in editable mode uv pip install -e .[dev,webclient]
-
Set up Environment Variables: Create a
.env
file in the project root by copying the example:cp .env.example .env # Ensure .env.example exists and is up-to-date
Edit
.env
with your local PostgreSQL connection details. The key variable isDATABASE_URL
. Example for a user 'devuser' with password 'devpass' connecting to database 'mlcbakery_dev':# .env DATABASE_URL=postgresql+asyncpg://devuser:devpass@localhost:5432/mlcbakery_dev
(Ensure your PostgreSQL server is running and the specified database exists and the user has permissions)
-
Run Database Migrations: Apply the latest database schema using Alembic.
uv run
executes commands within the project's managed environment.uv run alembic upgrade heads
Running the Server (Locally)
Start the FastAPI application using uvicorn:
# Make sure your .env file is present for the DATABASE_URL
uv run uvicorn mlcbakery.main:app --reload --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000
(or your machine's IP address).
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
Running Tests
The tests are configured to run against a PostgreSQL database defined by the DATABASE_URL
environment variable. You can use the same database as your development environment or configure a separate test database in your .env
file if preferred (adjust connection string as needed).
# Ensure DATABASE_URL is set in your environment or .env file
uv run pytest
To run specific tests:
uv run pytest tests/test_activities.py -v
Project Structure
mlcbakery/
├── alembic/ # Database migrations (Alembic)
├── .github/ # GitHub Actions workflows
├── mlcbakery/ # Main application package
│ ├── models/ # SQLAlchemy models
│ ├── schemas/ # Pydantic schemas
│ ├── api/ # API routes (FastAPI)
│ └── main.py # FastAPI application entrypoint
├── tests/ # Test suite (pytest)
├── .env.example # Example environment variables
├── alembic.ini # Alembic configuration
├── pyproject.toml # Project metadata and dependencies (uv/Poetry)
└── README.md # This file
Database Schema
Managed by Alembic migrations in the alembic/versions
directory. The main tables include:
collections
entities
(polymorphic base for datasets, models, etc.)datasets
trained_models
activities
agents
activity_relationships
(tracks provenance)
Resetting the database (Local Development)
If using a local PostgreSQL instance, you can drop and recreate the database:
# Example commands using psql
# Connect as a superuser or the database owner
dropdb mlcbakery_dev
createdb mlcbakery_dev
# Re-run migrations
uv run alembic upgrade heads
Warning: This deletes all data in the development database.
Contributing
- Create a new branch for your feature (
git checkout -b feature/my-new-feature
) - Make your changes
- Run tests to ensure everything passes (
uv run pytest
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin feature/my-new-feature
) - Submit a pull request
License
MIT
Deployment (Docker Compose)
This project includes a docker-compose.yml
file for easier deployment of the API, database, Streamlit viewer, and Caddy reverse proxy.
Prerequisites
- Docker and Docker Compose installed.
- A Docker network named
caddy-network
created:docker network create caddy-network
Steps
-
Configure Environment Variables: The
docker-compose.yml
file sets a defaultDATABASE_URL
pointing to thedb
service within the Docker network. However, you must configure theADMIN_AUTH_TOKEN
for theapi
service. You can do this by:- Creating a
.env
file: Create a.env
file in the project root and add the following line:
Docker Compose automatically loadsADMIN_AUTH_TOKEN=your_secure_admin_token_here
.env
files. - Modifying
docker-compose.yml
: Directly add theADMIN_AUTH_TOKEN
under theenvironment
section of theapi
service (less secure for secrets). - Passing at runtime: Use the
-e
flag withdocker-compose up
, e.g.,ADMIN_AUTH_TOKEN=your_secure_admin_token_here docker-compose up -d
.
- Creating a
-
Build and Run Services: Navigate to the project root directory and run:
docker-compose up --build -d
This will build the necessary images and start all services (api, mcp_server, streamlit, db, caddy) in the background.
-
Database Migrations: Once the
db
andapi
containers are running, apply the database migrations by executing thealembic
command inside the runningapi
container:docker-compose exec api alembic upgrade head
Note: You might need to wait a few seconds for the database service to fully initialize before running migrations.
-
Accessing Services:
- API: The API will be accessible via the Caddy reverse proxy, typically at
http://localhost
orhttp://<your-domain>
if configured inCaddyfile
. Direct access (bypassing Caddy) is usually on port 8000 if mapped. Swagger UI:http://localhost/docs
(or/api/v1/docs
depending on Caddy setup). - Streamlit Viewer: Accessible via Caddy, e.g.,
http://streamlit.localhost
. - MCP Server: Accessible via Caddy, e.g.,
http://mcp.localhost
. - Caddy: Handles reverse proxying based on
Caddyfile
. ModifyCaddyfile
and restart thecaddy
service (docker-compose restart caddy
) to update domains or proxy configurations.
- API: The API will be accessible via the Caddy reverse proxy, typically at
Stopping Services
docker-compose down
To remove the volumes (including database data):
docker-compose down -v
Important Notes
ADMIN_AUTH_TOKEN
: This token is required for any mutable API operations (POST, PUT, PATCH, DELETE). Include it in requests as a Bearer token in theAuthorization
header (e.g.,Authorization: Bearer your_secure_admin_token_here
).DATABASE_URL
: Ensure theapi
andstreamlit
services can reach the database specified byDATABASE_URL
. The default indocker-compose.yml
assumes thedb
service within the same Docker network.Caddyfile
: Customize theCaddyfile
for your specific domains and HTTPS setup. The provided file includes examples for local.localhost
domains and placeholders likebakery.jetty.io
. Remember to restart Caddy after changes.caddy-network
: The services rely on the external Docker networkcaddy-network
for inter-service communication and Caddy proxying. Ensure this network exists.
Some useful commands
Add / drop the database:
docker compose exec db psql -U postgres -c "drop DATABASE mlcbakery;"
docker compose exec db psql -U postgres -c "create DATABASE mlcbakery;"
Once the api server is running, migrate the schema:
docker compose exec api alembic -c alembic.ini upgrade heads