Anycrawl
AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web.
- Rating
- 4 (471 reviews)
- Downloads
- 2,660 downloads
- Version
- 1.0.0
Overview
AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web.
Complete Documentation
View Source →AnyCrawl Skill
AnyCrawl API integration for OpenClaw - Scrape, Crawl, and Search web content with high-performance multi-threaded crawling.
Setup
Method 1: Environment variable (Recommended)
export ANYCRAWL_API_KEY="your-api-key"
Make it permanent by adding to ~/.bashrc or ~/.zshrc:
echo 'export ANYCRAWL_API_KEY="your-api-key"' >> ~/.bashrc
source ~/.bashrc
Get your API key at: https://anycrawl.dev
Method 2: OpenClaw gateway config
openclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"
Functions
1. anycrawl_scrape
Scrape a single URL and convert to LLM-ready structured data.
Parameters:
url(string, required): URL to scrapeengine(string, optional): Scraping engine -"cheerio"(default),"playwright","puppeteer"formats(array, optional): Output formats -["markdown"],["html"],["text"],["json"],["screenshot"]timeout(number, optional): Timeout in milliseconds (default: 30000)wait_for(number, optional): Delay before extraction in ms (browser engines only)wait_for_selector(string/object/array, optional): Wait for CSS selectorsinclude_tags(array, optional): Include only these HTML tags (e.g.,["h1", "p", "article"])exclude_tags(array, optional): Exclude these HTML tagsproxy(string, optional): Proxy URL (e.g.,"http://proxy:port")json_options(object, optional): JSON extraction with schema/promptextract_source(string, optional):"markdown"(default) or"html"
// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })
// Scrape SPA with Playwright
anycrawl_scrape({
url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]
})
// Extract structured JSON
anycrawl_scrape({
url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }
},
required: ["product_name", "price"]
},
user_prompt: "Extract product details from this page"
}
})
2. anycrawl_search
Search Google and return structured results.
Parameters:
query(string, required): Search queryengine(string, optional): Search engine -"google"(default)limit(number, optional): Max results per page (default: 10)offset(number, optional): Number of results to skip (default: 0)pages(number, optional): Number of pages to retrieve (default: 1, max: 20)lang(string, optional): Language locale (e.g.,"en","zh","vi")safe_search(number, optional): 0 (off), 1 (medium), 2 (high)scrape_options(object, optional): Scrape each result URL with these options
// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })
// Multi-page search in Vietnamese
anycrawl_search({
query: "hướng dẫn Node.js",
pages: 3,
lang: "vi"
})
// Search and auto-scrape results
anycrawl_search({
query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]
}
})
3. anycrawl_crawl_start
Start crawling an entire website (async job).
Parameters:
url(string, required): Seed URL to start crawlingengine(string, optional):"cheerio"(default),"playwright","puppeteer"strategy(string, optional):"all","same-domain"(default),"same-hostname","same-origin"max_depth(number, optional): Max depth from seed URL (default: 10)limit(number, optional): Max pages to crawl (default: 100)include_paths(array, optional): Path patterns to include (e.g.,["/blog/*"])exclude_paths(array, optional): Path patterns to exclude (e.g.,["/admin/*"])scrape_paths(array, optional): Only scrape URLs matching these patternsscrape_options(object, optional): Per-page scrape options
// Crawl entire website
anycrawl_crawl_start({
url: "https://docs.example.com",
engine: "cheerio",
max_depth: 5,
limit: 50
})
// Crawl only blog posts
anycrawl_crawl_start({
url: "https://example.com",
strategy: "same-domain",
include_paths: ["/blog/*"],
exclude_paths: ["/blog/tags/*"],
scrape_options: {
formats: ["markdown"]
}
})
// Crawl product pages only
anycrawl_crawl_start({
url: "https://shop.example.com",
strategy: "same-domain",
scrape_paths: ["/products/*"],
limit: 200
})
4. anycrawl_crawl_status
Check crawl job status.
Parameters:
job_id(string, required): Crawl job ID
anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })
5. anycrawl_crawl_results
Get crawl results (paginated).
Parameters:
job_id(string, required): Crawl job IDskip(number, optional): Number of results to skip (default: 0)
// Get first 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 0 })
// Get next 100 results
anycrawl_crawl_results({ job_id: "xxx", skip: 100 })
6. anycrawl_crawl_cancel
Cancel a running crawl job.
Parameters:
job_id(string, required): Crawl job ID
7. anycrawl_search_and_scrape
Quick helper: Search Google then scrape top results.
Parameters:
query(string, required): Search querymax_results(number, optional): Max results to scrape (default: 3)scrape_engine(string, optional): Engine for scraping (default:"cheerio")formats(array, optional): Output formats (default:["markdown"])lang(string, optional): Search language
anycrawl_search_and_scrape({
query: "latest AI news",
max_results: 5,
formats: ["markdown"]
})
Engine Selection Guide
| Engine | Best For | Speed | JS Rendering |
|---|---|---|---|
| cheerio | Static HTML, news, blogs | ⚡ Fastest | ❌ No |
| playwright | SPAs, complex web apps | 🐢 Slower | ✅ Yes |
| puppeteer | Chrome-specific, metrics | 🐢 Slower | ✅ Yes |
Response Format
All responses follow this structure:
{
"success": true,
"data": { ... },
"message": "Optional message"
}
Error response:
{
"success": false,
"error": "Error type",
"message": "Human-readable message"
}
Common Error Codes
400- Bad Request (validation errors)401- Unauthorized (invalid API key)402- Payment Required (insufficient credits)404- Not Found429- Rate limit exceeded500- Internal server error
API Limits
- Rate limits apply based on your plan
- Crawl jobs expire after 24 hours
- Max crawl limit: depends on credits
Links
- API Docs: https://docs.anycrawl.dev
- Website: https://anycrawl.dev
- Playground: https://anycrawl.dev/playground
Installation
openclaw install anycrawl
💻Code Examples
source ~/.bashrc
Get your API key at: https://anycrawl.dev
### Method 2: OpenClaw gateway configopenclaw config.patch --set ANYCRAWL_API_KEY="your-api-key"
## Functions
### 1. anycrawl_scrape
Scrape a single URL and convert to LLM-ready structured data.
**Parameters:**
- `url` (string, required): URL to scrape
- `engine` (string, optional): Scraping engine - `"cheerio"` (default), `"playwright"`, `"puppeteer"`
- `formats` (array, optional): Output formats - `["markdown"]`, `["html"]`, `["text"]`, `["json"]`, `["screenshot"]`
- `timeout` (number, optional): Timeout in milliseconds (default: 30000)
- `wait_for` (number, optional): Delay before extraction in ms (browser engines only)
- `wait_for_selector` (string/object/array, optional): Wait for CSS selectors
- `include_tags` (array, optional): Include only these HTML tags (e.g., `["h1", "p", "article"]`)
- `exclude_tags` (array, optional): Exclude these HTML tags
- `proxy` (string, optional): Proxy URL (e.g., `"http://proxy:port"`)
- `json_options` (object, optional): JSON extraction with schema/prompt
- `extract_source` (string, optional): `"markdown"` (default) or `"html"`
**Examples:**})
### 2. anycrawl_search
Search Google and return structured results.
**Parameters:**
- `query` (string, required): Search query
- `engine` (string, optional): Search engine - `"google"` (default)
- `limit` (number, optional): Max results per page (default: 10)
- `offset` (number, optional): Number of results to skip (default: 0)
- `pages` (number, optional): Number of pages to retrieve (default: 1, max: 20)
- `lang` (string, optional): Language locale (e.g., `"en"`, `"zh"`, `"vi"`)
- `safe_search` (number, optional): 0 (off), 1 (medium), 2 (high)
- `scrape_options` (object, optional): Scrape each result URL with these options
**Examples:**})
### 3. anycrawl_crawl_start
Start crawling an entire website (async job).
**Parameters:**
- `url` (string, required): Seed URL to start crawling
- `engine` (string, optional): `"cheerio"` (default), `"playwright"`, `"puppeteer"`
- `strategy` (string, optional): `"all"`, `"same-domain"` (default), `"same-hostname"`, `"same-origin"`
- `max_depth` (number, optional): Max depth from seed URL (default: 10)
- `limit` (number, optional): Max pages to crawl (default: 100)
- `include_paths` (array, optional): Path patterns to include (e.g., `["/blog/*"]`)
- `exclude_paths` (array, optional): Path patterns to exclude (e.g., `["/admin/*"]`)
- `scrape_paths` (array, optional): Only scrape URLs matching these patterns
- `scrape_options` (object, optional): Per-page scrape options
**Examples:**})
### 4. anycrawl_crawl_status
Check crawl job status.
**Parameters:**
- `job_id` (string, required): Crawl job ID
**Example:**anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" })
### 5. anycrawl_crawl_results
Get crawl results (paginated).
**Parameters:**
- `job_id` (string, required): Crawl job ID
- `skip` (number, optional): Number of results to skip (default: 0)
**Example:**anycrawl_crawl_results({ job_id: "xxx", skip: 100 })
### 6. anycrawl_crawl_cancel
Cancel a running crawl job.
**Parameters:**
- `job_id` (string, required): Crawl job ID
### 7. anycrawl_search_and_scrape
Quick helper: Search Google then scrape top results.
**Parameters:**
- `query` (string, required): Search query
- `max_results` (number, optional): Max results to scrape (default: 3)
- `scrape_engine` (string, optional): Engine for scraping (default: `"cheerio"`)
- `formats` (array, optional): Output formats (default: `["markdown"]`)
- `lang` (string, optional): Search language
**Example:**})
## Engine Selection Guide
| Engine | Best For | Speed | JS Rendering |
|--------|----------|-------|--------------|
| `cheerio` | Static HTML, news, blogs | ⚡ Fastest | ❌ No |
| `playwright` | SPAs, complex web apps | 🐢 Slower | ✅ Yes |
| `puppeteer` | Chrome-specific, metrics | 🐢 Slower | ✅ Yes |
## Response Format
All responses follow this structure:// Basic scrape with default cheerio
anycrawl_scrape({ url: "https://example.com" })
// Scrape SPA with Playwright
anycrawl_scrape({
url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]
})
// Extract structured JSON
anycrawl_scrape({
url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }
},
required: ["product_name", "price"]
},
user_prompt: "Extract product details from this page"
}
})// Basic search
anycrawl_search({ query: "OpenAI ChatGPT" })
// Multi-page search in Vietnamese
anycrawl_search({
query: "hướng dẫn Node.js",
pages: 3,
lang: "vi"
})
// Search and auto-scrape results
anycrawl_search({
query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]
}
})Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
Adversarial Prompting
Adversarial analysis to critique, fix.