✓ Verified 💻 Development ✓ Enhanced Data

Docling

Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling

Rating
4.3 (157 reviews)
Downloads
11,620 downloads
Version
1.0.0

Overview

Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU.

Complete Documentation

View Source →

Docling - Document & Web Content Extraction

CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.

Prerequisites

  • docling CLI must be installed (e.g., via pipx install docling)
  • For GPU support: NVIDIA GPU with CUDA drivers

When to Use

  • Extract content from a URL → Use docling (not web_fetch)
  • Search for information → Use web_search (Brave)
  • Parse PDFs, DOCX, PPTX → Use docling
  • OCR on images → Use docling

Quick Commands

Web Page → Markdown (default)

bash
docling "<URL>" --from html --to md
Output: creates a .md file in current directory (or use --output)

Web Page → Plain Text

bash
docling "<URL>" --from html --to text --output /tmp/docling_out

PDF with OCR

bash
docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out

Key Options

OptionValuesDescription
--fromhtml, pdf, docx, pptx, image, md, csv, xlsxInput format
--tomd, text, json, yaml, htmlOutput format
--deviceauto, cuda, cpuAccelerator (default: auto)
--outputpathOutput directory (recommended: use controlled temp dir)
--ocrflagEnable OCR for images/scanned PDFs
--tablesflagExtract tables (default: on)

Security Notes

⚠️ Avoid these flags unless you trust the source:

  • --enable-remote-services - can send data to remote endpoints
  • --allow-external-plugins - loads third-party code
  • Custom --headers with untrusted values - can redirect requests

Workflow

  • For web content extraction: Use docling "" --from html --to text --output /tmp/docling_out
  • Read the output file from the specified output directory
  • Clean up the output directory after reading

GPU Support

Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:

bash
python -c "import torch; print(torch.cuda.is_available())"

Full CLI Reference

See references/cli-reference.md for complete option list.

Installation

Terminal bash

openclaw install docling
    
Copied!

Tags

#web_and-frontend-development #cli #web

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author er3mit4
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install docling