Url Fetcher
Simple web content fetching without API keys or external dependencies.
- Rating
- 4.3 (128 reviews)
- Downloads
- 1,056 downloads
- Version
- 1.0.0
Overview
Simple web content fetching without API keys or external dependencies.
✨Key Features
No dependencies - Uses Python stdlib (urllib) only
No API keys - Completely free to use
URL validation - Blocks localhost/internal networks
Basic markdown conversion - Extract content from HTML
Path validation - Safe file writes only (workspace, home, /tmp)
Error handling - Timeout and network error handling
Complete Documentation
View Source →
URL Fetcher
Fetch web content without API keys or external dependencies. Uses Python standard library only.
Quick Start
url_fetcher.py fetch <url>
url_fetcher.py fetch --markdown <url> [output_file]
Examples:
# Fetch and preview
url_fetcher.py fetch https://example.com
# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html
# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
Features
- No dependencies - Uses Python stdlib (urllib) only
- No API keys - Completely free to use
- URL validation - Blocks localhost/internal networks
- Basic markdown conversion - Extract content from HTML
- Path validation - Safe file writes only (workspace, home, /tmp)
- Error handling - Timeout and network error handling
When to Use
- Content aggregation - Collect pages for processing
- Research collection - Save articles/pages locally
- Simple scraping - Extract text from web pages
- Markdown conversion - Basic HTML to text/markdown
- No-API alternatives - When you can't use paid APIs
Limitations
- Basic markdown - Simple regex-based conversion (not a full parser)
- No JavaScript - Only fetches static HTML
- Rate limiting - No built-in rate limiting (add your own if needed)
- Bot detection - Some sites may block the default User-Agent
Security Features
URL Validation
- ✅ Allows: http/https URLs
- ❌ Blocks: file://, data://, javascript: URLs
- ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)
File Path Validation
- ✅ Allows: workspace, home directory, /tmp
- ❌ Blocks: system paths (/etc, /usr, /var, etc.)
- ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)
Error Handling
- Timeout after 10 seconds
- HTTP error handling
- Network error handling
- Character encoding handling
Usage Patterns
Collecting Research
# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md
# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md
Content Aggregation
# Fetch pages for processing
url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html
# Extract text
url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md
Quick Preview
# Just preview content (no file save)
url_fetcher.py fetch https://example.com
Advanced Usage
Batch Fetching
#!/bin/bash
# batch_fetch.sh
URLS=(
"https://example.com/page1"
"https://example.com/page2"
"https://example.com/page3"
)
OUTPUT_DIR="$HOME/workspace/fetched"
mkdir -p "$OUTPUT_DIR"
for url in "${URLS[@]}"; do
filename=$(echo $url | sed 's|/||g')
url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md"
sleep 1 # Be nice to servers
done
Integration with Other Skills
Combine with research-assistant:
# Fetch article
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md
# Extract key points
# Then use research-assistant to organize findings
Combine with task-runner:
# Add task to fetch content
task_runner.py add "Fetch article on topic X" "research"
# Fetch when ready
url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md
Troubleshooting
Connection Timeout
Error: Request timeout after 10s
HTTP 403/429 Errors
Error: HTTP 403: Forbidden
- Add delay between requests
- Use a different User-Agent (modify source)
- Respect robots.txt
- Consider using an API if available
Encoding Issues
Error with special characters
Markdown Quality
Note: Basic markdown extraction
- Use dedicated markdown parsers
- Or post-process the output
- Or use a paid API with better parsing
Best Practices
- Be respectful - Add delays between requests (don't hammer servers)
- Check robots.txt - Respect site's crawling policies
- Rate limit yourself - Don't fetch too fast
- Validate URLs - Only fetch from trusted sources
- Save safely - Always use path-validated outputs
- Preview first - Use preview mode before saving
Integration Examples
Python Integration
from pathlib import Path
import subprocess
def fetch_and_process(url):
"""Fetch URL and process"""
output = Path.home() / "workspace" / "fetched" / "page.md"
output.parent.mkdir(parents=True, exist_ok=True)
# Fetch
subprocess.run([
"python3",
"/path/to/url_fetcher.py",
"fetch",
"--markdown",
url,
str(output)
])
# Process content
content = output.read_text()
return content
Bash Integration
# Function for fetching
fetch_content() {
local url="$1"
local output="$2"
python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \
fetch --markdown "$url" "$output"
}
# Usage
fetch_content "https://example.com" ~/workspace/example.md
Alternatives
When You Need More Features
For full-featured scraping:
- Use
requests+beautifulsoup4(requires pip install) - Or use
scrapyframework (requires pip install) - Or use paid APIs (Firecrawl, Apify)
markdownifylibrary (requires pip install)- Or use AI-based parsing (OpenAI, Anthropic APIs)
- Browser automation (OpenClaw browser tool)
- Headless Chrome (Puppeteer, Playwright)
- Or use scraping APIs (Zyte, ScraperAPI)
Zero-Cost Advantage
This skill requires:
- ✅ Python 3 (included with OpenClaw)
- ✅ No API keys
- ✅ No external packages
- ✅ No paid services
- ✅ No rate limiting (other than what you add)
Contributing
If you improve this skill, please:
- Test with security-checker
- Document new features
- Publish to ClawHub with credit
License
Use freely in your OpenClaw skills and workflows.
Installation
openclaw install url-fetcher
💻Code Examples
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
## Features
- **No dependencies** - Uses Python stdlib (urllib) only
- **No API keys** - Completely free to use
- **URL validation** - Blocks localhost/internal networks
- **Basic markdown conversion** - Extract content from HTML
- **Path validation** - Safe file writes only (workspace, home, /tmp)
- **Error handling** - Timeout and network error handling
## When to Use
- **Content aggregation** - Collect pages for processing
- **Research collection** - Save articles/pages locally
- **Simple scraping** - Extract text from web pages
- **Markdown conversion** - Basic HTML to text/markdown
- **No-API alternatives** - When you can't use paid APIs
## Limitations
- **Basic markdown** - Simple regex-based conversion (not a full parser)
- **No JavaScript** - Only fetches static HTML
- **Rate limiting** - No built-in rate limiting (add your own if needed)
- **Bot detection** - Some sites may block the default User-Agent
## Security Features
### URL Validation
- ✅ Allows: http/https URLs
- ❌ Blocks: file://, data://, javascript: URLs
- ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)
### File Path Validation
- ✅ Allows: workspace, home directory, /tmp
- ❌ Blocks: system paths (/etc, /usr, /var, etc.)
- ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)
### Error Handling
- Timeout after 10 seconds
- HTTP error handling
- Network error handling
- Character encoding handling
## Usage Patterns
### Collecting Researchurl_fetcher.py fetch https://example.com
## Advanced Usage
### Batch Fetchingdone
### Integration with Other Skills
**Combine with research-assistant:**url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md
## Troubleshooting
### Connection TimeoutError: Request timeout after 10s
**Solution:** The server is slow or unreachable. Try again later or check the URL.
### HTTP 403/429 ErrorsError: HTTP 403: Forbidden
**Solution:** The site blocks automated requests. Try:
- Add delay between requests
- Use a different User-Agent (modify source)
- Respect robots.txt
- Consider using an API if available
### Encoding IssuesError with special characters
**Solution:** The tool uses UTF-8 with error-ignore. Some characters may be lost.
### Markdown QualityNote: Basic markdown extraction
**Solution:** This tool uses simple regex for HTML→MD conversion. For better results:
- Use dedicated markdown parsers
- Or post-process the output
- Or use a paid API with better parsing
## Best Practices
1. **Be respectful** - Add delays between requests (don't hammer servers)
2. **Check robots.txt** - Respect site's crawling policies
3. **Rate limit yourself** - Don't fetch too fast
4. **Validate URLs** - Only fetch from trusted sources
5. **Save safely** - Always use path-validated outputs
6. **Preview first** - Use preview mode before saving
## Integration Examples
### Python Integration# Fetch and preview
url_fetcher.py fetch https://example.com
# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html
# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md
# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.mdTags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.