MLC Bakery
by jettyio
MLC Bakery is a Python-based service for managing ML model provenance and lineage. It is built with FastAPI and SQLAlchemy.
Last updated: N/A
MLC Bakery
A Python-based service for managing ML model provenance and lineage, built with FastAPI and SQLAlchemy.
Features
- Dataset management with collection support
- Entity tracking
- Activity logging
- Agent management
- Provenance relationships tracking
- RESTful API endpoints
Development Setup
Prerequisites
- Python 3.12+
- uv (Python package manager)
- PostgreSQL (running locally or via Docker)
Local Development Setup
-
Clone the repository:
git clone <your-repo-url> mlcbakery cd mlcbakery -
Install Dependencies:
uvusespyproject.tomlto manage dependencies. It will automatically create a virtual environment if one doesn't exist.# Install main, dev, and webclient dependencies in editable mode uv pip install -e .[dev,webclient] -
Set up Environment Variables: Create a
.envfile in the project root by copying the example:cp .env.example .env # Ensure .env.example exists and is up-to-dateEdit
.envwith your local PostgreSQL connection details. The key variable isDATABASE_URL. Example for a user 'devuser' with password 'devpass' connecting to database 'mlcbakery_dev':# .env DATABASE_URL=postgresql+asyncpg://devuser:devpass@localhost:5432/mlcbakery_dev(Ensure your PostgreSQL server is running and the specified database exists and the user has permissions)
-
Run Database Migrations: Apply the latest database schema using Alembic.
uv runexecutes commands within the project's managed environment.uv run alembic upgrade heads
Running the Server (Locally)
Start the FastAPI application using uvicorn:
# Make sure your .env file is present for the DATABASE_URL
uv run uvicorn mlcbakery.main:app --reload --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000 (or your machine's IP address).
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Running Tests
The tests are configured to run against a PostgreSQL database defined by the DATABASE_URL environment variable. You can use the same database as your development environment or configure a separate test database in your .env file if preferred (adjust connection string as needed).
# Ensure DATABASE_URL is set in your environment or .env file
uv run pytest
To run specific tests:
uv run pytest tests/test_activities.py -v
Project Structure
mlcbakery/
├── alembic/ # Database migrations (Alembic)
├── .github/ # GitHub Actions workflows
├── mlcbakery/ # Main application package
│ ├── models/ # SQLAlchemy models
│ ├── schemas/ # Pydantic schemas
│ ├── api/ # API routes (FastAPI)
│ └── main.py # FastAPI application entrypoint
├── tests/ # Test suite (pytest)
├── .env.example # Example environment variables
├── alembic.ini # Alembic configuration
├── pyproject.toml # Project metadata and dependencies (uv/Poetry)
└── README.md # This file
Database Schema
Managed by Alembic migrations in the alembic/versions directory. The main tables include:
collectionsentities(polymorphic base for datasets, models, etc.)datasetstrained_modelsactivitiesagentsactivity_relationships(tracks provenance)
Resetting the database (Local Development)
If using a local PostgreSQL instance, you can drop and recreate the database:
# Example commands using psql
# Connect as a superuser or the database owner
dropdb mlcbakery_dev
createdb mlcbakery_dev
# Re-run migrations
uv run alembic upgrade heads
Warning: This deletes all data in the development database.
Contributing
- Create a new branch for your feature (
git checkout -b feature/my-new-feature) - Make your changes
- Run tests to ensure everything passes (
uv run pytest) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin feature/my-new-feature) - Submit a pull request
License
MIT
Deployment (Docker Compose)
This project includes a docker-compose.yml file for easier deployment of the API, database, Streamlit viewer, and Caddy reverse proxy.
Prerequisites
- Docker and Docker Compose installed.
- A Docker network named
caddy-networkcreated:docker network create caddy-network
Steps
-
Configure Environment Variables: The
docker-compose.ymlfile sets a defaultDATABASE_URLpointing to thedbservice within the Docker network. However, you must configure theADMIN_AUTH_TOKENfor theapiservice. You can do this by:- Creating a
.envfile: Create a.envfile in the project root and add the following line:
Docker Compose automatically loadsADMIN_AUTH_TOKEN=your_secure_admin_token_here.envfiles. - Modifying
docker-compose.yml: Directly add theADMIN_AUTH_TOKENunder theenvironmentsection of theapiservice (less secure for secrets). - Passing at runtime: Use the
-eflag withdocker-compose up, e.g.,ADMIN_AUTH_TOKEN=your_secure_admin_token_here docker-compose up -d.
- Creating a
-
Build and Run Services: Navigate to the project root directory and run:
docker-compose up --build -dThis will build the necessary images and start all services (api, mcp_server, streamlit, db, caddy) in the background.
-
Database Migrations: Once the
dbandapicontainers are running, apply the database migrations by executing thealembiccommand inside the runningapicontainer:docker-compose exec api alembic upgrade headNote: You might need to wait a few seconds for the database service to fully initialize before running migrations.
-
Accessing Services:
- API: The API will be accessible via the Caddy reverse proxy, typically at
http://localhostorhttp://<your-domain>if configured inCaddyfile. Direct access (bypassing Caddy) is usually on port 8000 if mapped. Swagger UI:http://localhost/docs(or/api/v1/docsdepending on Caddy setup). - Streamlit Viewer: Accessible via Caddy, e.g.,
http://streamlit.localhost. - MCP Server: Accessible via Caddy, e.g.,
http://mcp.localhost. - Caddy: Handles reverse proxying based on
Caddyfile. ModifyCaddyfileand restart thecaddyservice (docker-compose restart caddy) to update domains or proxy configurations.
- API: The API will be accessible via the Caddy reverse proxy, typically at
Stopping Services
docker-compose down
To remove the volumes (including database data):
docker-compose down -v
Important Notes
ADMIN_AUTH_TOKEN: This token is required for any mutable API operations (POST, PUT, PATCH, DELETE). Include it in requests as a Bearer token in theAuthorizationheader (e.g.,Authorization: Bearer your_secure_admin_token_here).DATABASE_URL: Ensure theapiandstreamlitservices can reach the database specified byDATABASE_URL. The default indocker-compose.ymlassumes thedbservice within the same Docker network.Caddyfile: Customize theCaddyfilefor your specific domains and HTTPS setup. The provided file includes examples for local.localhostdomains and placeholders likebakery.jetty.io. Remember to restart Caddy after changes.caddy-network: The services rely on the external Docker networkcaddy-networkfor inter-service communication and Caddy proxying. Ensure this network exists.
Some useful commands
Add / drop the database:
docker compose exec db psql -U postgres -c "drop DATABASE mlcbakery;"
docker compose exec db psql -U postgres -c "create DATABASE mlcbakery;"
Once the api server is running, migrate the schema:
docker compose exec api alembic -c alembic.ini upgrade heads