✓ Verified 💻 Development ✓ Enhanced Data

Url Fetcher

Simple web content fetching without API keys or external dependencies.

Rating: 4.3 (128 reviews)
Downloads: 1,056 downloads
Version: 1.0.0

Overview

Simple web content fetching without API keys or external dependencies.

✨Key Features

No dependencies - Uses Python stdlib (urllib) only

No API keys - Completely free to use

URL validation - Blocks localhost/internal networks

Basic markdown conversion - Extract content from HTML

Path validation - Safe file writes only (workspace, home, /tmp)

Error handling - Timeout and network error handling

Complete Documentation

View Source →

URL Fetcher

Fetch web content without API keys or external dependencies. Uses Python standard library only.

Quick Start

bash

url_fetcher.py fetch <url>
url_fetcher.py fetch --markdown <url> [output_file]

Examples:

bash

# Fetch and preview
url_fetcher.py fetch https://example.com

# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html

# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

Features

No dependencies - Uses Python stdlib (urllib) only
No API keys - Completely free to use
URL validation - Blocks localhost/internal networks
Basic markdown conversion - Extract content from HTML
Path validation - Safe file writes only (workspace, home, /tmp)
Error handling - Timeout and network error handling

When to Use

Content aggregation - Collect pages for processing
Research collection - Save articles/pages locally
Simple scraping - Extract text from web pages
Markdown conversion - Basic HTML to text/markdown
No-API alternatives - When you can't use paid APIs

Limitations

Basic markdown - Simple regex-based conversion (not a full parser)
No JavaScript - Only fetches static HTML
Rate limiting - No built-in rate limiting (add your own if needed)
Bot detection - Some sites may block the default User-Agent

Security Features

URL Validation

✅ Allows: http/https URLs
❌ Blocks: file://, data://, javascript: URLs
❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

File Path Validation

✅ Allows: workspace, home directory, /tmp
❌ Blocks: system paths (/etc, /usr, /var, etc.)
❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

Error Handling

Timeout after 10 seconds
HTTP error handling
Network error handling
Character encoding handling

Usage Patterns

Collecting Research

bash

# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md

# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md

Content Aggregation

bash

# Fetch pages for processing
url_fetcher.py fetch https://news.example.com ~/workspace/content/latest.html

# Extract text
url_fetcher.py fetch --markdown https://blog.example.com ~/workspace/content/post.md

Quick Preview

bash

# Just preview content (no file save)
url_fetcher.py fetch https://example.com

Advanced Usage

Batch Fetching

bash

#!/bin/bash
# batch_fetch.sh

URLS=(
    "https://example.com/page1"
    "https://example.com/page2"
    "https://example.com/page3"
)

OUTPUT_DIR="$HOME/workspace/fetched"
mkdir -p "$OUTPUT_DIR"

for url in "${URLS[@]}"; do
    filename=$(echo $url | sed 's|/||g')
    url_fetcher.py fetch --markdown "$url" "$OUTPUT_DIR/$filename.md"
    sleep 1  # Be nice to servers
done

Integration with Other Skills

Combine with research-assistant:

bash

# Fetch article
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/article.md

# Extract key points
# Then use research-assistant to organize findings

Combine with task-runner:

bash

# Add task to fetch content
task_runner.py add "Fetch article on topic X" "research"

# Fetch when ready
url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

Troubleshooting

Connection Timeout

text

Error: Request timeout after 10s

Solution: The server is slow or unreachable. Try again later or check the URL.

HTTP 403/429 Errors

text

Error: HTTP 403: Forbidden

Solution: The site blocks automated requests. Try:

Add delay between requests
Use a different User-Agent (modify source)
Respect robots.txt
Consider using an API if available

Encoding Issues

text

Error with special characters

Solution: The tool uses UTF-8 with error-ignore. Some characters may be lost.

Markdown Quality

text

Note: Basic markdown extraction

Solution: This tool uses simple regex for HTML→MD conversion. For better results:

Use dedicated markdown parsers
Or post-process the output
Or use a paid API with better parsing

Best Practices

Be respectful - Add delays between requests (don't hammer servers)
Check robots.txt - Respect site's crawling policies
Rate limit yourself - Don't fetch too fast
Validate URLs - Only fetch from trusted sources
Save safely - Always use path-validated outputs
Preview first - Use preview mode before saving

Integration Examples

Python Integration

python

from pathlib import Path
import subprocess

def fetch_and_process(url):
    """Fetch URL and process"""
    output = Path.home() / "workspace" / "fetched" / "page.md"
    output.parent.mkdir(parents=True, exist_ok=True)
    
    # Fetch
    subprocess.run([
        "python3",
        "/path/to/url_fetcher.py",
        "fetch",
        "--markdown",
        url,
        str(output)
    ])
    
    # Process content
    content = output.read_text()
    return content

Bash Integration

bash

# Function for fetching
fetch_content() {
    local url="$1"
    local output="$2"
    python3 ~/workspace/skills/url-fetcher/scripts/url_fetcher.py \
        fetch --markdown "$url" "$output"
}

# Usage
fetch_content "https://example.com" ~/workspace/example.md

Alternatives

When You Need More Features

For full-featured scraping:

Use requests + beautifulsoup4 (requires pip install)
Or use scrapy framework (requires pip install)
Or use paid APIs (Firecrawl, Apify)

For better markdown:

markdownify library (requires pip install)
Or use AI-based parsing (OpenAI, Anthropic APIs)

For complex workflows:

Browser automation (OpenClaw browser tool)
Headless Chrome (Puppeteer, Playwright)
Or use scraping APIs (Zyte, ScraperAPI)

Zero-Cost Advantage

This skill requires:

✅ Python 3 (included with OpenClaw)
✅ No API keys
✅ No external packages
✅ No paid services
✅ No rate limiting (other than what you add)

Perfect for autonomous agents with budget constraints.

Contributing

If you improve this skill, please:

Test with security-checker
Document new features
Publish to ClawHub with credit

License

Use freely in your OpenClaw skills and workflows.

Installation

Terminal bash


openclaw install url-fetcher

Copied!

💻Code Examples

url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

urlfetcherpy-fetch---markdown-httpsexamplecom-workspacepagemd.txt

## Features

- **No dependencies** - Uses Python stdlib (urllib) only
- **No API keys** - Completely free to use
- **URL validation** - Blocks localhost/internal networks
- **Basic markdown conversion** - Extract content from HTML
- **Path validation** - Safe file writes only (workspace, home, /tmp)
- **Error handling** - Timeout and network error handling

## When to Use

- **Content aggregation** - Collect pages for processing
- **Research collection** - Save articles/pages locally
- **Simple scraping** - Extract text from web pages
- **Markdown conversion** - Basic HTML to text/markdown
- **No-API alternatives** - When you can't use paid APIs

## Limitations

- **Basic markdown** - Simple regex-based conversion (not a full parser)
- **No JavaScript** - Only fetches static HTML
- **Rate limiting** - No built-in rate limiting (add your own if needed)
- **Bot detection** - Some sites may block the default User-Agent

## Security Features

### URL Validation
- ✅ Allows: http/https URLs
- ❌ Blocks: file://, data://, javascript: URLs
- ❌ Blocks: localhost, 127.0.0.1, ::1 (internal networks)

### File Path Validation
- ✅ Allows: workspace, home directory, /tmp
- ❌ Blocks: system paths (/etc, /usr, /var, etc.)
- ❌ Blocks: sensitive dotfiles (~/.ssh, ~/.bashrc, etc.)

### Error Handling
- Timeout after 10 seconds
- HTTP error handling
- Network error handling
- Character encoding handling

## Usage Patterns

### Collecting Research

url_fetcher.py fetch https://example.com

urlfetcherpy-fetch-httpsexamplecom.txt

## Advanced Usage

### Batch Fetching

done

done.txt

### Integration with Other Skills

**Combine with research-assistant:**

url_fetcher.py fetch https://example.com/topic-x.md ~/workspace/research/topic-x.md

urlfetcherpy-fetch-httpsexamplecomtopic-xmd-workspaceresearchtopic-xmd.txt

## Troubleshooting

### Connection Timeout

Error: Request timeout after 10s

error-request-timeout-after-10s.txt

**Solution:** The server is slow or unreachable. Try again later or check the URL.

### HTTP 403/429 Errors

Error: HTTP 403: Forbidden

error-http-403-forbidden.txt

**Solution:** The site blocks automated requests. Try:
- Add delay between requests
- Use a different User-Agent (modify source)
- Respect robots.txt
- Consider using an API if available

### Encoding Issues

Error with special characters

error-with-special-characters.txt

**Solution:** The tool uses UTF-8 with error-ignore. Some characters may be lost.

### Markdown Quality

Note: Basic markdown extraction

note-basic-markdown-extraction.txt

**Solution:** This tool uses simple regex for HTML→MD conversion. For better results:
- Use dedicated markdown parsers
- Or post-process the output
- Or use a paid API with better parsing

## Best Practices

1. **Be respectful** - Add delays between requests (don't hammer servers)
2. **Check robots.txt** - Respect site's crawling policies
3. **Rate limit yourself** - Don't fetch too fast
4. **Validate URLs** - Only fetch from trusted sources
5. **Save safely** - Always use path-validated outputs
6. **Preview first** - Use preview mode before saving

## Integration Examples

### Python Integration

example.sh

# Fetch and preview
url_fetcher.py fetch https://example.com

# Fetch and save HTML
url_fetcher.py fetch https://example.com ~/workspace/page.html

# Fetch and convert to basic markdown
url_fetcher.py fetch --markdown https://example.com ~/workspace/page.md

example.sh

# Fetch multiple articles
url_fetcher.py fetch https://example.com/article1.md ~/workspace/research/article1.md
url_fetcher.py fetch https://example.com/article2.md ~/workspace/research/article2.md

# Convert to markdown for reading
url_fetcher.py fetch --markdown https://example.com/article.md ~/workspace/research/article.md