
Getting Started with Docling: PDF to Structured Data
Docling is an open-source document conversion tool from IBM Research . It takes PDFs and converts them into clean, structured output like Markdown, HTML, JSON, or plain text. It handles layout analysis, table extraction, image embedding, OCR, and even a vision-based pipeline for complex documents. This guide walks through installation, the core conversion options, and the advanced flags worth knowing. Installation Use a virtual environment: python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install docling Verify: docling --version # Should output: Docling version: 2.xx.x Basic Conversion Docling accepts both local file paths and remote URLs: docling https://example.com/document.pdf docling ./my-report.pdf Default output is Markdown, written to your current directory. For a typical document, expect around two minutes and minimal resource usage. Output Formats Markdown (default) docling file.pdf # or explicitly docling file.pdf --to md Text, headings, t
Continue reading on Dev.to
Opens in a new tab



