✓ Verified
💻 Development
✓ Enhanced Data
Smart Web Scraper
Extract structured data from any web page.
- Rating
- 4.1 (432 reviews)
- Downloads
- 11,479 downloads
- Version
- 1.0.0
Overview
Extract structured data from any web page.
Complete Documentation
View Source →
Smart Web Scraper
Extract structured data from web pages into clean JSON or CSV.
Quick Start
bash
# Scrape a page, extract all text content
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com"
# Extract specific elements with CSS selector
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com/products" -s ".product-card"
# Auto-detect and extract tables
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing"
# Extract all links from a page
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com"
# Extract structured data (title, meta, headings, links)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py structure "https://example.com"
# Output as JSON
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".item" -f json
# Output as CSV
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s "table tr" -f csv
# Save to file
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".product" -f json -o products.json
# Multi-page scrape (follow pagination)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py crawl "https://example.com/page/1" --pages 5 -s ".article"
Commands
| Command | Args | Description |
|---|---|---|
| extract | Extract content, optionally filtered by CSS selector | |
| tables | Auto-detect and extract all HTML tables | |
| links | Extract all links (href + text) | |
| structure | Extract page structure: title, meta, headings, images, links | |
| crawl | Follow pagination links, extract from multiple pages |
Output Formats
| Format | Flag | Description |
|---|---|---|
| Text | -f text | Plain text (default) |
| JSON | -f json | Structured JSON array |
| CSV | -f csv | Comma-separated values |
| Markdown | -f md | Markdown-formatted |
Examples
Extract product listings
bash
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://shop.example.com" -s ".product" -f json
json
[
{"text": "Widget Pro - $29.99", "tag": "div", "class": "product"},
{"text": "Widget Max - $49.99", "tag": "div", "class": "product"}
]
Extract pricing table
bash
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing" -f csv
Get all external links
bash
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com" --external
Rate Limiting
- Default: 1 request per second (respectful crawling)
- Override with
--delay 0.5(seconds between requests) - Respects
robots.txtby default (override with--ignore-robots)
Notes
- Requires
beautifulsoup4andlxml(auto-installed byuv run --with) - Uses a standard browser User-Agent to avoid blocks
- Handles redirects, encoding detection, and error pages gracefully
- No JavaScript rendering (use for static HTML pages)
Installation
Terminal bash
openclaw install smart-web-scraper
Copied!
💻Code Examples
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py crawl "https://example.com/page/1" --pages 5 -s ".article"
uv-run---with-beautifulsoup4---with-lxml-python-scriptsscraperpy-crawl-httpsexamplecompage1---pages-5--s-article.txt
## Commands
| Command | Args | Description |
|---------|------|-------------|
| `extract` | `<url> [-s selector] [-f format] [-o file]` | Extract content, optionally filtered by CSS selector |
| `tables` | `<url> [-f format] [-o file]` | Auto-detect and extract all HTML tables |
| `links` | `<url> [--external] [--internal]` | Extract all links (href + text) |
| `structure` | `<url>` | Extract page structure: title, meta, headings, images, links |
| `crawl` | `<url> --pages N [-s selector] [-f format] [-o file]` | Follow pagination links, extract from multiple pages |
## Output Formats
| Format | Flag | Description |
|--------|------|-------------|
| Text | `-f text` | Plain text (default) |
| JSON | `-f json` | Structured JSON array |
| CSV | `-f csv` | Comma-separated values |
| Markdown | `-f md` | Markdown-formatted |
## Examples
### Extract product listingsexample.sh
# Scrape a page, extract all text content
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com"
# Extract specific elements with CSS selector
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com/products" -s ".product-card"
# Auto-detect and extract tables
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py tables "https://example.com/pricing"
# Extract all links from a page
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py links "https://example.com"
# Extract structured data (title, meta, headings, links)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py structure "https://example.com"
# Output as JSON
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".item" -f json
# Output as CSV
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s "table tr" -f csv
# Save to file
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py extract "https://example.com" -s ".product" -f json -o products.json
# Multi-page scrape (follow pagination)
uv run --with beautifulsoup4 --with lxml python scripts/scraper.py crawl "https://example.com/page/1" --pages 5 -s ".article"example.json
[
{"text": "Widget Pro - $29.99", "tag": "div", "class": "product"},
{"text": "Widget Max - $49.99", "tag": "div", "class": "product"}
]Tags
#web_and-frontend-development
#data
#web
Quick Info
Category Development
Model Claude 3.5
Complexity One-Click
Author mariusfit
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
Ready to Install?
Get started with this skill in seconds
openclaw install smart-web-scraper
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
⚡ GPT-Optimized
)}
★ 3.8 (274)
↓ 17,648
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0