✓ Verified 🌐 Web Scrapers ✓ Enhanced Data

Anycrawl

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web.

Rating
4 (471 reviews)
Downloads
2,660 downloads
Version
1.0.0

Overview

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web.

Complete Documentation

View Source →

AnyCrawl Skill

AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.

Setup

Method 1: Environment variable (Recommended)

bash
export ANYCRAWL_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc:

bash
echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc

Get your API key at: https://anycrawl.dev

Method 2: OpenClaw gateway config

bash
openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

Functions

1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

Parameters:

  • url (string, required): URL to scrape
  • engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer"
  • formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"]
  • timeout (number, optional): Timeout in milliseconds (default: 30000)
  • wait_for (number, optional): Delay before extraction in ms (browser engines only)
  • wait_for_selector (string/object/array, optional): Wait for CSS selectors
  • include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"])
  • exclude_tags (array, optional): Exclude these HTML tags
  • proxy (string, optional): Proxy URL (e.g., "http://proxy:port")
  • json_options (object, optional): JSON extraction with schema/prompt
  • extract_source (string, optional): "markdown" (default) or "html"
Examples:

javascript
// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({ 
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})

2. anycrawl_search

Search Google and return structured results.

Parameters:

  • query (string, required): Search query
  • engine (string, optional): Search engine - "google" (default)
  • limit (number, optional): Max results per page (default: 10)
  • offset (number, optional): Number of results to skip (default: 0)
  • pages (number, optional): Number of pages to retrieve (default: 1, max: 20)
  • lang (string, optional): Language locale (e.g., "en", "zh", "vi")
  • safe_search (number, optional): 0 (off), 1 (medium), 2 (high)
  • scrape_options (object, optional): Scrape each result URL with these options
Examples:

javascript
// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({ 
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

3. anycrawl_crawl_start

Start crawling an entire website (async job).

Parameters:

  • url (string, required): Seed URL to start crawling
  • engine (string, optional): "cheerio" (default), "playwright", "puppeteer"
  • strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin"
  • max_depth (number, optional): Max depth from seed URL (default: 10)
  • limit (number, optional): Max pages to crawl (default: 100)
  • include_paths (array, optional): Path patterns to include (e.g., ["/blog/*"])
  • exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/*"])
  • scrape_paths (array, optional): Only scrape URLs matching these patterns
  • scrape_options (object, optional): Per-page scrape options
Examples:

javascript
// Crawl entire website
anycrawl_crawl_start({ 
  url: "https://docs.example.com",
  engine: "cheerio",
  max_depth: 5,
  limit: 50
})

// Crawl only blog posts
anycrawl_crawl_start({
  url: "https://example.com",
  strategy: "same-domain",
  include_paths: ["/blog/*"],
  exclude_paths: ["/blog/tags/*"],
  scrape_options: {
    formats: ["markdown"]
  }
})

// Crawl product pages only
anycrawl_crawl_start({
  url: "https://shop.example.com",
  strategy: "same-domain",
  scrape_paths: ["/products/*"],
  limit: 200
})

4. anycrawl_crawl_status

Check crawl job status.

Parameters:

  • job_id (string, required): Crawl job ID
Example:
javascript
anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

5. anycrawl_crawl_results

Get crawl results (paginated).

Parameters:

  • job_id (string, required): Crawl job ID
  • skip (number, optional): Number of results to skip (default: 0)
Example:
javascript
// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })

// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

6. anycrawl_crawl_cancel

Cancel a running crawl job.

Parameters:

  • job_id (string, required): Crawl job ID

7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

Parameters:

  • query (string, required): Search query
  • max_results (number, optional): Max results to scrape (default: 3)
  • scrape_engine (string, optional): Engine for scraping (default: "cheerio")
  • formats (array, optional): Output formats (default: ["markdown"])
  • lang (string, optional): Search language
Example:
javascript
anycrawl_search_and_scrape({
  query: "latest AI news",
  max_results: 5,
  formats: ["markdown"]
})

Engine Selection Guide

EngineBest ForSpeedJS Rendering
cheerioStatic HTML, news, blogs⚡ Fastest❌ No
playwrightSPAs, complex web apps🐢 Slower✅ Yes
puppeteerChrome-specific, metrics🐢 Slower✅ Yes

Response Format

All responses follow this structure:

json
{
  "success": true,
  "data": { ... },
  "message": "Optional message"
}

Error response:

json
{
  "success": false,
  "error": "Error type",
  "message": "Human-readable message"
}

Common Error Codes

  • 400 - Bad Request (validation errors)
  • 401 - Unauthorized (invalid API key)
  • 402 - Payment Required (insufficient credits)
  • 404 - Not Found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

API Limits

  • Rate limits apply based on your plan
  • Crawl jobs expire after 24 hours
  • Max crawl limit: depends on credits

Links

  • API Docs: https://docs.anycrawl.dev
  • Website: https://anycrawl.dev
  • Playground: https://anycrawl.dev/playground

Installation

Terminal bash

openclaw install anycrawl
    
Copied!

💻Code Examples

source ~/.bashrc

source-bashrc.txt
Get your API key at: https://anycrawl.dev

### Method 2: OpenClaw gateway config

openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"

openclaw-configpatch---set-anycrawlapikeyyour-api-key.txt
## Functions

### 1. anycrawl_scrape

Scrape a single URL and convert to LLM-ready structured data.

**Parameters:**
- `url` (string, required): URL to scrape
- `engine` (string, optional): Scraping engine - `"cheerio"` (default), `"playwright"`, `"puppeteer"`
- `formats` (array, optional): Output formats - `["markdown"]`, `["html"]`, `["text"]`, `["json"]`, `["screenshot"]`
- `timeout` (number, optional): Timeout in milliseconds (default: 30000)
- `wait_for` (number, optional): Delay before extraction in ms (browser engines only)
- `wait_for_selector` (string/object/array, optional): Wait for CSS selectors
- `include_tags` (array, optional): Include only these HTML tags (e.g., `["h1", "p", "article"]`)
- `exclude_tags` (array, optional): Exclude these HTML tags
- `proxy` (string, optional): Proxy URL (e.g., `"http://proxy:port"`)
- `json_options` (object, optional): JSON extraction with schema/prompt
- `extract_source` (string, optional): `"markdown"` (default) or `"html"`

**Examples:**

})

.txt
### 2. anycrawl_search

Search Google and return structured results.

**Parameters:**
- `query` (string, required): Search query
- `engine` (string, optional): Search engine - `"google"` (default)
- `limit` (number, optional): Max results per page (default: 10)
- `offset` (number, optional): Number of results to skip (default: 0)
- `pages` (number, optional): Number of pages to retrieve (default: 1, max: 20)
- `lang` (string, optional): Language locale (e.g., `"en"`, `"zh"`, `"vi"`)
- `safe_search` (number, optional): 0 (off), 1 (medium), 2 (high)
- `scrape_options` (object, optional): Scrape each result URL with these options

**Examples:**

})

.txt
### 3. anycrawl_crawl_start

Start crawling an entire website (async job).

**Parameters:**
- `url` (string, required): Seed URL to start crawling
- `engine` (string, optional): `"cheerio"` (default), `"playwright"`, `"puppeteer"`
- `strategy` (string, optional): `"all"`, `"same-domain"` (default), `"same-hostname"`, `"same-origin"`
- `max_depth` (number, optional): Max depth from seed URL (default: 10)
- `limit` (number, optional): Max pages to crawl (default: 100)
- `include_paths` (array, optional): Path patterns to include (e.g., `["/blog/*"]`)
- `exclude_paths` (array, optional): Path patterns to exclude (e.g., `["/admin/*"]`)
- `scrape_paths` (array, optional): Only scrape URLs matching these patterns
- `scrape_options` (object, optional): Per-page scrape options

**Examples:**

})

.txt
### 4. anycrawl_crawl_status

Check crawl job status.

**Parameters:**
- `job_id` (string, required): Crawl job ID

**Example:**

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })

anycrawlcrawlstatus-jobid-7a2e165d-8f81-4be6-9ef7-23222330a396-.txt
### 5. anycrawl_crawl_results

Get crawl results (paginated).

**Parameters:**
- `job_id` (string, required): Crawl job ID
- `skip` (number, optional): Number of results to skip (default: 0)

**Example:**

anycrawl_crawl_results({ job_id: "xxx", skip: 100 })

anycrawlcrawlresults-jobid-xxx-skip-100-.txt
### 6. anycrawl_crawl_cancel

Cancel a running crawl job.

**Parameters:**
- `job_id` (string, required): Crawl job ID

### 7. anycrawl_search_and_scrape

Quick helper: Search Google then scrape top results.

**Parameters:**
- `query` (string, required): Search query
- `max_results` (number, optional): Max results to scrape (default: 3)
- `scrape_engine` (string, optional): Engine for scraping (default: `"cheerio"`)
- `formats` (array, optional): Output formats (default: `["markdown"]`)
- `lang` (string, optional): Search language

**Example:**

})

.txt
## Engine Selection Guide

| Engine | Best For | Speed | JS Rendering |
|--------|----------|-------|--------------|
| `cheerio` | Static HTML, news, blogs | ⚡ Fastest | ❌ No |
| `playwright` | SPAs, complex web apps | 🐢 Slower | ✅ Yes |
| `puppeteer` | Chrome-specific, metrics | 🐢 Slower | ✅ Yes |

## Response Format

All responses follow this structure:
example.js
// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })

// Scrape SPA with Playwright
anycrawl_scrape({ 
  url: "https://spa-example.com",
  engine: "playwright",
  formats: ["markdown", "screenshot"]
})

// Extract structured JSON
anycrawl_scrape({
  url: "https://product-page.com",
  engine: "cheerio",
  json_options: {
    schema: {
      type: "object",
      properties: {
        product_name: { type: "string" },
        price: { type: "number" },
        description: { type: "string" }
      },
      required: ["product_name", "price"]
    },
    user_prompt: "Extract product details from this page"
  }
})
example.js
// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })

// Multi-page search in Vietnamese
anycrawl_search({ 
  query: "hướng dẫn Node.js",
  pages: 3,
  lang: "vi"
})

// Search and auto-scrape results
anycrawl_search({
  query: "best AI tools 2026",
  limit: 5,
  scrape_options: {
    engine: "cheerio",
    formats: ["markdown"]
  }
})

Tags

#browser_and-automation #api #integration #web

Quick Info

Category Web Scrapers
Model Claude 3.5
Complexity One-Click
Author techlaai
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install anycrawl