✓ Verified 💻 Development ✓ Enhanced Data

Desearch Crawl

Crawl/scrape and extract content from any webpage URL.

Rating
4.4 (136 reviews)
Downloads
1,435 downloads
Version
1.0.0

Overview

Crawl/scrape and extract content from any webpage URL.

Complete Documentation

View Source →

Crawl Webpage By Desearch

Extract content from any webpage URL. Returns clean text or raw HTML.

Quick Start

  • Get an API key from https://console.desearch.ai
  • Set environment variable: export DESEARCH_API_KEY='your-key-here'

Usage

bash
# Crawl a webpage (returns clean text by default)
scripts/desearch.py crawl "https://en.wikipedia.org/wiki/Artificial_intelligence"

# Get raw HTML
scripts/desearch.py crawl "https://example.com" --crawl-format html

Options

OptionDescription
--crawl-formatOutput content format: text (default) or html

Examples

Read a documentation page

bash
scripts/desearch.py crawl "https://docs.python.org/3/tutorial/index.html"

Get raw HTML for analysis

bash
scripts/desearch.py crawl "https://example.com/page" --crawl-format html

Response

Example (format=text, truncated, default)

text
Artificial intelligence (AI) is the capability of computational systems to perform tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...

Example (format=html, truncated)

html
<!DOCTYPE html>
<html>
  <head><title>Artificial intelligence - Wikipedia</title></head>
  <body>
    <p>Artificial intelligence (AI) is the capability of computational systems...</p>
  </body>
</html>

Notes

  • Response is plain text or raw HTML — not JSON.
  • Default format is text. Use --crawl-format html only when you need to inspect page structure.
  • Prefer text format to avoid bloating the agent context with markup.

Errors

Status 401, Unauthorized (e.g., missing/invalid API key)
json
{
  "detail": "Invalid or missing API key"
}

Status 402, Payment Required (e.g., balance depleted)

json
{
  "detail": "Insufficient balance, please add funds to your account to continue using the service."
}

Resources

Installation

Terminal bash

openclaw install desearch-crawl
    
Copied!

💻Code Examples

scripts/desearch.py crawl "https://example.com" --crawl-format html

scriptsdesearchpy-crawl-httpsexamplecom---crawl-format-html.txt
## Options

| Option | Description |
|--------|-------------|
| `--crawl-format` | Output content format: `text` (default) or `html` |

## Examples

### Read a documentation page

scripts/desearch.py crawl "https://example.com/page" --crawl-format html

scriptsdesearchpy-crawl-httpsexamplecompage---crawl-format-html.txt
## Response

### Example (`format=text`, truncated, default)

</html>

html.txt
### Notes
- Response is plain text or raw HTML — not JSON.
- Default format is `text`. Use `--crawl-format html` only when you need to inspect page structure.
- Prefer `text` format to avoid bloating the agent context with markup.

### Errors
Status 401, Unauthorized (e.g., missing/invalid API key)
example.sh
# Crawl a webpage (returns clean text by default)
scripts/desearch.py crawl "https://en.wikipedia.org/wiki/Artificial_intelligence"

# Get raw HTML
scripts/desearch.py crawl "https://example.com" --crawl-format html
example.html
<!DOCTYPE html>
<html>
  <head><title>Artificial intelligence - Wikipedia</title></head>
  <body>
    <p>Artificial intelligence (AI) is the capability of computational systems...</p>
  </body>
</html>
example.json
{
  "detail": "Invalid or missing API key"
}
example.json
{
  "detail": "Insufficient balance, please add funds to your account to continue using the service."
}

Tags

#web_and-frontend-development #web

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author okradze
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install desearch-crawl