MarkItDown logo

MarkItDown

by Microsoft

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. It focuses on preserving important document structure and content as Markdown.

View on GitHub

Last updated: N/A

What is MarkItDown?

MarkItDown is a Python utility designed to convert various file formats into Markdown, optimized for use with Large Language Models (LLMs) and text analysis pipelines. It aims to preserve document structure and content effectively.

How to use MarkItDown?

MarkItDown can be used via the command line or through its Python API. The command-line interface allows for direct conversion of files to Markdown, with options for specifying output files and piping content. The Python API provides more programmatic control, allowing integration into existing Python workflows, including options for enabling plugins, using Azure Document Intelligence, and leveraging LLMs for image descriptions.

Key features of MarkItDown

  • Converts various file formats to Markdown (PDF, PowerPoint, Word, Excel, Images, Audio, HTML, Text-based formats, ZIP files, Youtube URLs, EPubs)

  • Preserves document structure (headings, lists, tables, links)

  • Supports optional dependencies for specific file formats

  • Supports 3rd-party plugins

  • Integrates with Azure Document Intelligence for enhanced conversion

  • Leverages LLMs for image descriptions

Use cases of MarkItDown

  • Preparing documents for ingestion into LLMs

  • Automating the conversion of documents for text analysis

  • Extracting structured content from various file formats

  • Building text analysis pipelines that leverage Markdown as an intermediate format

FAQ from MarkItDown

What file formats does MarkItDown support?

MarkItDown supports a wide range of file formats, including PDF, PowerPoint, Word, Excel, Images, Audio, HTML, Text-based formats (CSV, JSON, XML), ZIP files, Youtube URLs, and EPubs.

How do I install MarkItDown?

You can install MarkItDown using pip: pip install 'markitdown[all]'. Alternatively, you can install it from the source.

How do I use MarkItDown from the command line?

You can use the markitdown command followed by the path to the file you want to convert. You can also use the -o option to specify the output file.

How do I use MarkItDown in Python?

You can import the MarkItDown class and use the convert() method to convert files to Markdown.

How do I enable plugins?

You can enable plugins by passing enable_plugins=True to the MarkItDown constructor or by using the --use-plugins command-line option.