✓ Verified 💻 Development ✓ Enhanced Data

Url Fetcher

Simple web content fetching without API keys or external dependencies.

Rating
4.3 (128 reviews)
Downloads
1,056 downloads
Version
1.0.0

Overview

Simple web content fetching without API keys or external dependencies.

Key Features

1

No dependencies - Uses Python stdlib (urllib) only

2

No API keys - Completely free to use

3

URL validation - Blocks localhost/internal networks

4

Basic markdown conversion - Extract content from HTML

5

Path validation - Safe file writes only (workspace, home, /tmp)

6

Error handling - Timeout and network error handling

Complete Documentation

View Source →

URL Fetcher

Fetch web content without API keys or external dependencies. Uses Python standard library only.

Quick Start

bash
url_fetcher.py fetch <url>
url_fetcher.py fetch --markdown <url> [output_file]

Examples:

bash
# Fetch and preview
url_fetcher.py fetch https://example.com

# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html

# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

Features

  • No dependencies - Uses Python stdlib (urllib) only
  • No API keys - Completely free to use
  • URL validation - Blocks localhost/internal networks
  • Basic markdown conversion - Extract content from HTML
  • Path validation - Safe file writes only (workspace, home, /tmp)
  • Error handling - Timeout and network error handling

When to Use

  • Content aggregation - Collect pages for processing
  • Research collection - Save articles/pages locally
  • Simple scraping - Extract text from web pages
  • Markdown conversion - Basic HTML to text/markdown
  • No-API alternatives - When you can't use paid APIs

Limitations

  • Basic markdown - Simple regex-based conversion (not a full parser)
  • No JavaScript - Only fetches static HTML
  • Rate limiting - No built-in rate limiting (add your own if needed)
  • Bot detection - Some sites may block the default User-Agent

Security Features

URL Validation

  • ✅ Allows: http/https URLs
  • ❌ Blocks: file://, data://, javascript: URLs
  • ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

File Path Validation

  • ✅ Allows: workspace, home directory, /tmp
  • ❌ Blocks: system paths (/etc, /usr, /var, etc.)
  • ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

Error Handling

  • Timeout after 10 seconds
  • HTTP error handling
  • Network error handling
  • Character encoding handling

Usage Patterns

Collecting Research

bash
# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md

# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md

Content Aggregation

bash
# Fetch pages for processing
url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html

# Extract text
url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md

Quick Preview

bash
# Just preview content (no file save)
url_fetcher.py fetch https://example.com

Advanced Usage

Batch Fetching

bash
#!/bin/bash
# batch_fetch.sh

URLS=(
    "https://example.com/page1"
    "https://example.com/page2"
    "https://example.com/page3"
)

OUTPUT_DIR="$HOME/workspace/fetched"
mkdir -p "$OUTPUT_DIR"

for url in "${URLS[@]}"; do
    filename=$(echo $url | sed 's|/||g')
    url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md"
    sleep 1  # Be nice to servers
done

Integration with Other Skills

Combine with research-assistant:

bash
# Fetch article
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md

# Extract key points
# Then use research-assistant to organize findings

Combine with task-runner:

bash
# Add task to fetch content
task_runner.py add "Fetch article on topic X" "research"

# Fetch when ready
url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

Troubleshooting

Connection Timeout

text
Error: Request timeout after 10s
Solution: The server is slow or unreachable. Try again later or check the URL.

HTTP 403/429 Errors

text
Error: HTTP 403: Forbidden
Solution: The site blocks automated requests. Try:
  • Add delay between requests
  • Use a different User-Agent (modify source)
  • Respect robots.txt
  • Consider using an API if available

Encoding Issues

text
Error with special characters
Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.

Markdown Quality

text
Note: Basic markdown extraction
Solution: This tool uses simple regex for HTML→MD conversion. For better results:
  • Use dedicated markdown parsers
  • Or post-process the output
  • Or use a paid API with better parsing

Best Practices

  • Be respectful - Add delays between requests (don't hammer servers)
  • Check robots.txt - Respect site's crawling policies
  • Rate limit yourself - Don't fetch too fast
  • Validate URLs - Only fetch from trusted sources
  • Save safely - Always use path-validated outputs
  • Preview first - Use preview mode before saving

Integration Examples

Python Integration

python
from pathlib import Path
import subprocess

def fetch_and_process(url):
    """Fetch URL and process"""
    output = Path.home() / "workspace" / "fetched" / "page.md"
    output.parent.mkdir(parents=True, exist_ok=True)
    
    # Fetch
    subprocess.run([
        "python3",
        "/path/to/url_fetcher.py",
        "fetch",
        "--markdown",
        url,
        str(output)
    ])
    
    # Process content
    content = output.read_text()
    return content

Bash Integration

bash
# Function for fetching
fetch_content() {
    local url="$1"
    local output="$2"
    python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \
        fetch --markdown "$url" "$output"
}

# Usage
fetch_content "https://example.com" ~/workspace/example.md

Alternatives

When You Need More Features

For full-featured scraping:

  • Use requests + beautifulsoup4 (requires pip install)
  • Or use scrapy framework (requires pip install)
  • Or use paid APIs (Firecrawl, Apify)
For better markdown:
  • markdownify library (requires pip install)
  • Or use AI-based parsing (OpenAI, Anthropic APIs)
For complex workflows:
  • Browser automation (OpenClaw browser tool)
  • Headless Chrome (Puppeteer, Playwright)
  • Or use scraping APIs (Zyte, ScraperAPI)

Zero-Cost Advantage

This skill requires:

  • ✅ Python 3 (included with OpenClaw)
  • ✅ No API keys
  • ✅ No external packages
  • ✅ No paid services
  • ✅ No rate limiting (other than what you add)
Perfect for autonomous agents with budget constraints.

Contributing

If you improve this skill, please:

  • Test with security-checker
  • Document new features
  • Publish to ClawHub with credit

License

Use freely in your OpenClaw skills and workflows.

Installation

Terminal bash

openclaw install url-fetcher
    
Copied!

💻Code Examples

url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

urlfetcherpy-fetch---markdown-httpsexamplecom-workspacepagemd.txt
## Features

- **No dependencies** - Uses Python stdlib (urllib) only
- **No API keys** - Completely free to use
- **URL validation** - Blocks localhost/internal networks
- **Basic markdown conversion** - Extract content from HTML
- **Path validation** - Safe file writes only (workspace, home, /tmp)
- **Error handling** - Timeout and network error handling

## When to Use

- **Content aggregation** - Collect pages for processing
- **Research collection** - Save articles/pages locally
- **Simple scraping** - Extract text from web pages
- **Markdown conversion** - Basic HTML to text/markdown
- **No-API alternatives** - When you can't use paid APIs

## Limitations

- **Basic markdown** - Simple regex-based conversion (not a full parser)
- **No JavaScript** - Only fetches static HTML
- **Rate limiting** - No built-in rate limiting (add your own if needed)
- **Bot detection** - Some sites may block the default User-Agent

## Security Features

### URL Validation
- ✅ Allows: http/https URLs
- ❌ Blocks: file://, data://, javascript: URLs
- ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

### File Path Validation
- ✅ Allows: workspace, home directory, /tmp
- ❌ Blocks: system paths (/etc, /usr, /var, etc.)
- ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

### Error Handling
- Timeout after 10 seconds
- HTTP error handling
- Network error handling
- Character encoding handling

## Usage Patterns

### Collecting Research

url_fetcher.py fetch https://example.com

urlfetcherpy-fetch-httpsexamplecom.txt
## Advanced Usage

### Batch Fetching

done

done.txt
### Integration with Other Skills

**Combine with research-assistant:**

url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

urlfetcherpy-fetch-httpsexamplecomtopic-xmd-workspaceresearchtopic-xmd.txt
## Troubleshooting

### Connection Timeout

Error: Request timeout after 10s

error-request-timeout-after-10s.txt
**Solution:** The server is slow or unreachable. Try again later or check the URL.

### HTTP 403/429 Errors

Error: HTTP 403: Forbidden

error-http-403-forbidden.txt
**Solution:** The site blocks automated requests. Try:
- Add delay between requests
- Use a different User-Agent (modify source)
- Respect robots.txt
- Consider using an API if available

### Encoding Issues

Error with special characters

error-with-special-characters.txt
**Solution:** The tool uses UTF-8 with error-ignore. Some characters may be lost.

### Markdown Quality

Note: Basic markdown extraction

note-basic-markdown-extraction.txt
**Solution:** This tool uses simple regex for HTML→MD conversion. For better results:
- Use dedicated markdown parsers
- Or post-process the output
- Or use a paid API with better parsing

## Best Practices

1. **Be respectful** - Add delays between requests (don't hammer servers)
2. **Check robots.txt** - Respect site's crawling policies
3. **Rate limit yourself** - Don't fetch too fast
4. **Validate URLs** - Only fetch from trusted sources
5. **Save safely** - Always use path-validated outputs
6. **Preview first** - Use preview mode before saving

## Integration Examples

### Python Integration
example.sh
# Fetch and preview
url_fetcher.py fetch https://example.com

# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html

# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md
example.sh
# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md

# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md

Tags

#web_and-frontend-development #api #web

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author johstracke
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install url-fetcher