✓ Verified 💻 Development ✓ Enhanced Data

Playwright Scraper Skill

Playwright-based web scraping OpenClaw Skill with anti-bot protection.

Rating
3.9 (163 reviews)
Downloads
4,029 downloads
Version
1.0.0

Overview

Playwright-based web scraping OpenClaw Skill with anti-bot protection.

Complete Documentation

View Source →

Playwright Scraper Skill

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.


🎯 Use Case Matrix

Target WebsiteAnti-Bot LevelRecommended MethodScript
Regular SitesLowweb_fetch toolN/A (built-in)
Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.js
Cloudflare ProtectedHighPlaywright Stealthscripts/playwright-stealth.js
YouTubeSpecialdeep-scraperInstall separately
RedditSpecialreddit-scraperInstall separately

📦 Installation

bash
cd playwright-scraper-skill
npm install
npx playwright install chromium


🚀 Quick Start

1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in web_fetch tool:

bash
# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com


2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple:

bash
node scripts/playwright-simple.js "https://example.com"

Example output:

json
{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "...",
  "elapsedSeconds": "3.45"
}


3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth:

bash
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

Features:

  • Hide automation markers (navigator.webdriver = false)
  • Realistic User-Agent (iPhone, Android)
  • Random delays to mimic human behavior
  • Screenshot and HTML saving support

4️⃣ YouTube Video Transcripts

Use deep-scraper (install separately):

bash
# Install deep-scraper skill
npx clawhub install deep-scraper

# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"


📖 Script Descriptions

scripts/playwright-simple.js

  • Use Case: Regular dynamic websites
  • Speed: Fast (3-5 seconds)
  • Anti-Bot: None
  • Output: JSON (title, content, URL)

scripts/playwright-stealth.js

  • Use Case: Sites with Cloudflare or anti-bot protection
  • Speed: Medium (5-20 seconds)
  • Anti-Bot: Medium-High (hides automation, realistic UA)
  • Output: JSON + Screenshot + HTML file
  • Verified: 100% success on Discuss.com.hk

🎓 Best Practices

1. Try web_fetch First

If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.

2. Need JavaScript? Use Playwright Simple

If you need to wait for JavaScript rendering, use playwright-simple.js.

3. Getting Blocked? Use Stealth

If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.

4. Special Sites Need Specialized Skills

  • YouTube → deep-scraper
  • Reddit → reddit-scraper
  • Twitter → bird skill

🔧 Customization

All scripts support environment variables:

bash
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL


📊 Performance Comparison

MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)
web_fetch⚡ Fastest❌ None0%
Playwright Simple🚀 Fast⚠️ Low20%
Playwright Stealth⏱️ Medium✅ Medium100%
Puppeteer Stealth⏱️ Medium✅ Medium-High~80%
Crawlee (deep-scraper)🐢 Slow❌ Detected0%
Chaser (Rust)⏱️ Medium❌ Detected0%

🛡️ Anti-Bot Techniques Summary

Lessons learned from our testing:

✅ Effective Anti-Bot Measures

  • Hide navigator.webdriver — Essential
  • Realistic User-Agent — Use real devices (iPhone, Android)
  • Mimic Human Behavior — Random delays, scrolling
  • Avoid Framework Signatures — Crawlee, Selenium are easily detected
  • Use addInitScript (Playwright) — Inject before page load

❌ Ineffective Anti-Bot Measures

  • Only changing User-Agent — Not enough
  • Using high-level frameworks (Crawlee) — More easily detected
  • Docker isolation — Doesn't help with Cloudflare

🔍 Troubleshooting

Issue: 403 Forbidden

Solution: Use playwright-stealth.js

Issue: Cloudflare Challenge Page

Solution:
  • Increase wait time (10-15 seconds)
  • Try headless: false (headful mode sometimes has higher success rate)
  • Consider using proxy IPs

Issue: Blank Page

Solution:
  • Increase waitForTimeout
  • Use waitUntil: 'networkidle' or 'domcontentloaded'
  • Check if login is required

📝 Memory & Experience

2026-02-07 Discuss.com.hk Test Conclusions

  • Pure Playwright + Stealth succeeded (5s, 200 OK)
  • ❌ Crawlee (deep-scraper) failed (403)
  • ❌ Chaser (Rust) failed (Cloudflare)
  • ❌ Puppeteer standard failed (403)
Best Solution: Pure Playwright + anti-bot techniques (framework-independent)


🚧 Future Improvements

  • [ ] Add proxy IP rotation
  • [ ] Implement cookie management (maintain login state)
  • [ ] Add CAPTCHA handling (2captcha / Anti-Captcha)
  • [ ] Batch scraping (parallel URLs)
  • [ ] Integration with OpenClaw's browser tool

📚 References

Installation

Terminal bash

openclaw install playwright-scraper-skill
    
Copied!

💻Code Examples

npx playwright install chromium

npx-playwright-install-chromium.txt
---

## 🚀 Quick Start

### 1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in `web_fetch` tool:

Hey, fetch me the content from https://example.com

hey-fetch-me-the-content-from-httpsexamplecom.txt
---

### 2️⃣ Dynamic Sites (Requires JavaScript)

Use **Playwright Simple**:

}

.txt
---

### 3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use **Playwright Stealth**:

node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

node-scriptsplaywright-stealthjs-httpsmdiscusscomhkhot.txt
**Features:**
- Hide automation markers (`navigator.webdriver = false`)
- Realistic User-Agent (iPhone, Android)
- Random delays to mimic human behavior
- Screenshot and HTML saving support

---

### 4️⃣ YouTube Video Transcripts

Use **deep-scraper** (install separately):

node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"

node-assetsyoutubehandlerjs-httpswwwyoutubecomwatchvvideoid.txt
---

## 📖 Script Descriptions

### `scripts/playwright-simple.js`
- **Use Case:** Regular dynamic websites
- **Speed:** Fast (3-5 seconds)
- **Anti-Bot:** None
- **Output:** JSON (title, content, URL)

### `scripts/playwright-stealth.js` ⭐
- **Use Case:** Sites with Cloudflare or anti-bot protection
- **Speed:** Medium (5-20 seconds)
- **Anti-Bot:** Medium-High (hides automation, realistic UA)
- **Output:** JSON + Screenshot + HTML file
- **Verified:** 100% success on Discuss.com.hk

---

## 🎓 Best Practices

### 1. Try web_fetch First
If the site doesn't have dynamic loading, use OpenClaw's `web_fetch` tool—it's fastest.

### 2. Need JavaScript? Use Playwright Simple
If you need to wait for JavaScript rendering, use `playwright-simple.js`.

### 3. Getting Blocked? Use Stealth
If you encounter 403 or Cloudflare challenges, use `playwright-stealth.js`.

### 4. Special Sites Need Specialized Skills
- YouTube → deep-scraper
- Reddit → reddit-scraper
- Twitter → bird skill

---

## 🔧 Customization

All scripts support environment variables:
example.sh
cd playwright-scraper-skill
npm install
npx playwright install chromium
example.json
{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "...",
  "elapsedSeconds": "3.45"
}
example.sh
# Install deep-scraper skill
npx clawhub install deep-scraper

# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
example.sh
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL

Tags

#web_and-frontend-development #api #bot #scraping #web

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author waisimon
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install playwright-scraper-skill