Docling
Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling
- Rating
- 4.3 (157 reviews)
- Downloads
- 11,620 downloads
- Version
- 1.0.0
Overview
Extract and parse content from web pages, PDFs, documents (docx, pptx), and images using the docling CLI with GPU.
Complete Documentation
View Source →
Docling - Document & Web Content Extraction
CLI tool for parsing documents and web pages into clean, structured text. Uses GPU acceleration for OCR and ML models.
Prerequisites
doclingCLI must be installed (e.g., viapipx install docling)- For GPU support: NVIDIA GPU with CUDA drivers
When to Use
- Extract content from a URL → Use docling (not web_fetch)
- Search for information → Use web_search (Brave)
- Parse PDFs, DOCX, PPTX → Use docling
- OCR on images → Use docling
Quick Commands
Web Page → Markdown (default)
docling "<URL>" --from html --to md
.md file in current directory (or use --output)Web Page → Plain Text
docling "<URL>" --from html --to text --output /tmp/docling_out
PDF with OCR
docling "/path/to/file.pdf" --ocr --device cuda --output /tmp/docling_out
Key Options
| Option | Values | Description |
|---|---|---|
| --from | html, pdf, docx, pptx, image, md, csv, xlsx | Input format |
| --to | md, text, json, yaml, html | Output format |
| --device | auto, cuda, cpu | Accelerator (default: auto) |
| --output | path | Output directory (recommended: use controlled temp dir) |
| --ocr | flag | Enable OCR for images/scanned PDFs |
| --tables | flag | Extract tables (default: on) |
Security Notes
⚠️ Avoid these flags unless you trust the source:
--enable-remote-services- can send data to remote endpoints--allow-external-plugins- loads third-party code- Custom
--headerswith untrusted values - can redirect requests
Workflow
- For web content extraction: Use
docling "" --from html --to text --output /tmp/docling_out - Read the output file from the specified output directory
- Clean up the output directory after reading
GPU Support
Docling supports GPU acceleration via CUDA (NVIDIA). Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"
Full CLI Reference
See references/cli-reference.md for complete option list.
Installation
openclaw install docling
Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.