Databricks MCP Server

by RafaelCartenet

Data/Integration Databricks SQL API Data

This is a Model Context Protocol (MCP) server for executing SQL queries against Databricks using the Statement Execution API. It can retrieve data by performing SQL requests using the Databricks API, and when used in Agent mode, it can iterate over requests to perform complex tasks.

View on GitHub

Last updated: N/A

What is Databricks MCP Server?

The Databricks MCP Server is a tool that allows you to execute SQL queries against Databricks using the Statement Execution API. It acts as a bridge between MCP clients (like Agent Composer or Cursor) and Databricks, enabling you to retrieve data and perform operations on your Databricks data.

How to use Databricks MCP Server?

To use the server, you need to install the required dependencies, configure your Databricks credentials (either through a .env file or environment variables), and then run the server. You can run it in standalone mode or integrate it with tools like Cursor by configuring the mcp.json file with the correct path to the server.

Key features of Databricks MCP Server

Execute SQL queries on Databricks
List available schemas in a catalog
List tables in a schema
Describe table schemas

Use cases of Databricks MCP Server

Executing SQL queries against Databricks from Agent Composer
Retrieving data from Databricks using Cursor's AI assistant
Automating data extraction and transformation tasks
Integrating Databricks data into other applications via MCP

FAQ from Databricks MCP Server

What is the Model Context Protocol (MCP)?

MCP is a protocol for communication between different components in a system, allowing them to share context and data.

How do I find my Databricks SQL Warehouse ID?

You can find your SQL warehouse ID in the Databricks UI under SQL Warehouses.

What permissions are required to use this server?

The user associated with the provided token must have appropriate permissions to access the specified SQL warehouse, catalogs, schemas, and tables. It's recommended to use a dedicated token with read-only permissions where possible.

How do I handle long-running queries?

The server is designed to handle long-running queries by polling the Databricks API until the query completes or times out. The default timeout is 10 minutes, which can be adjusted in the dbapi.py file.

What if I am not using uv?

If you're not using uv, you can use python instead in the mcp.json file.