Convert file like pdf to markdown

Python
Author

Tony D

Published

July 24, 2025

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines

install markitdown

git clone git@github.com:microsoft/markitdown.git
cd markitdown
pip install -e 'packages/markitdown[all]'

convert xlsx to md

Code
from markitdown import MarkItDown

md = MarkItDown(enable_plugins=False) # Set to True to enable plugins
result = md.convert("weight.xlsx")
print(result.text_content)
Code
with open("weight.md", "w") as f:
    f.write(result.text_content)

convert pdf to md

Code
from markitdown import MarkItDown

md = MarkItDown(enable_plugins=False) # Set to True to enable plugins
result = md.convert("Modern_intro_probability_statistics.pdf")
#print(result.text_content)
Code
with open("Modern_intro_probability_statistics.md", "w") as f:
    f.write(result.text_content)

convert image to md with LLM model(currently only support Open AI)

https://github.com/microsoft/markitdown/issues/1129

Code
from markitdown import MarkItDown
from openai import OpenAI

client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("example.jpg")
print(result.text_content)

reference:

https://github.com/microsoft/markitdown