✓ Verified 💻 Development ✓ Enhanced Data

Find Emails

Crawl websites locally with crawl4ai to extract contact emails.

Rating: 4.7 (38 reviews)
Downloads: 1,204 downloads
Version: 1.0.0

Overview

Crawl websites locally with crawl4ai to extract contact emails.

Complete Documentation

View Source →

Find Emails

CLI for crawling websites locally via crawl4ai and extracting contact emails from pages likely to contain them (contact, about, support, team, etc.).

Setup

Install dependencies: pip install crawl4ai
Run the script:

bash

python scripts/find_emails.py https://example.com

Quick Start

bash

# Crawl a site
python scripts/find_emails.py https://example.com

# Multiple URLs
python scripts/find_emails.py https://example.com https://other.com

# JSON output
python scripts/find_emails.py https://example.com -j

# Save to file
python scripts/find_emails.py https://example.com -o emails.txt

Script

find_emails.py — Crawl and Extract Emails

bash

python scripts/find_emails.py <url> [url ...]
python scripts/find_emails.py https://example.com
python scripts/find_emails.py https://example.com -j -o results.json
python scripts/find_emails.py --from-file page.md

Arguments:

Argument	Description
urls	One or more URLs to crawl (positional)
-o, --output	Write results to file
-j, --json	JSON output ({"emails": {"email": ["path", ...]}})
-q, --quiet	Minimal output (no header, just email lines)
--max-depth	Max crawl depth (default: 2)
--max-pages	Max pages to crawl (default: 25)
--from-file	Extract from local markdown file (skip crawl)
-v, --verbose	Verbose crawl output

Output format (human-readable):

Emails are grouped by domain. Clear structure for multi-URL runs:

text

Found 3 unique email(s) across 2 domain(s)

## example.com

  • [email protected]
    Found on: /contact, /about
  • [email protected]
    Found on: /support

## other.com

  • [email protected]
    Found on: /contact-us

Output format (JSON):

LLM-friendly structure with summary and per-domain breakdown:

json

{
  "summary": {
    "domains_crawled": 2,
    "total_unique_emails": 3
  },
  "emails_by_domain": {
    "example.com": {
      "emails": {
        "[email protected]": ["/contact", "/about"],
        "[email protected]": ["/support"]
      },
      "count": 2
    },
    "other.com": {
      "emails": {
        "[email protected]": ["/contact-us"]
      },
      "count": 1
    }
  }
}

Configuration

Edit scripts/url_patterns.json to customize which URLs the crawler follows. Only links matching these glob-style patterns are included:

json

{
  "url_patterns": [
    "*contact*",
    "*support*",
    "*about*",
    "*team*",
    "*email*",
    "*reach*",
    "*staff*",
    "*inquiry*",
    "*enquir*",
    "*get-in-touch*",
    "*contact-us*",
    "*about-us*"
  ]
}

If the file is missing or invalid, default patterns are used.

Workflow

Crawl a site:

bash

python scripts/find_emails.py https://example.com -o emails.json

Extract from local file (e.g., cached markdown):

bash

python scripts/find_emails.py --from-file crawled.md -j

Customize URL filters by editing scripts/url_patterns.json.

Dependencies

bash

pip install crawl4ai
playwright install

Requires a browser (Playwright) for local crawling.

Batch Processing

bash

# Crawl multiple sites – results grouped by domain for clear attribution
python scripts/find_emails.py https://site1.com https://site2.com -j -o combined.json

# Extract from multiple local files
for f in crawled/*.md; do
  echo "=== $f ==="
  python scripts/find_emails.py --from-file "$f" -q
done

Multiple URLs are fully supported; output clearly associates each email with its source domain. Domains are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not listed separately.

Use Case Example: Social Media Growth Strategy

Skills can be powerful tools for streamlining social media workflows. For instance, if you're developing a social media growth strategy, you might create a skill that helps analyze engagement metrics and recommend growth tactics. Companies like SocialCrow offer services to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through quality content and engagement remains the most sustainable approach, understanding the ecosystem of social media growth tools can inform your strategy development.

Installation

Terminal bash


openclaw install find-emails

Copied!

💻Code Examples

python scripts/find_emails.py https://example.com

python-scriptsfindemailspy-httpsexamplecom.txt

## Quick Start

t

python scripts/find_emails.py https://example.com -o emails.txt

python-scriptsfindemailspy-httpsexamplecom--o-emailstxt.txt

---

## Script

### find_emails.py — Crawl and Extract Emails

python scripts/find_emails.py --from-file page.md

python-scriptsfindemailspy---from-file-pagemd.txt

**Arguments:**

| Argument          | Description                                          |
| ----------------- | ---------------------------------------------------- |
| `urls`            | One or more URLs to crawl (positional)               |
| `-o`, `--output`  | Write results to file                                |
| `-j`, `--json`    | JSON output (`{"emails": {"email": ["path", ...]}}`) |
| `-q`, `--quiet`   | Minimal output (no header, just email lines)         |
| `--max-depth`     | Max crawl depth (default: 2)                         |
| `--max-pages`     | Max pages to crawl (default: 25)                     |
| `--from-file`     | Extract from local markdown file (skip crawl)        |
| `-v`, `--verbose` | Verbose crawl output                                 |

**Output format (human-readable):**

Emails are grouped by domain. Clear structure for multi-URL runs:

Found on: /contact-us

-found-on-contact-us.txt

**Output format (JSON):**

LLM-friendly structure with summary and per-domain breakdown:

}

.txt

---

## Configuration

Edit `scripts/url_patterns.json` to customize which URLs the crawler follows. Only links matching these glob-style patterns are included:

}

.txt

If the file is missing or invalid, default patterns are used.

---

## Workflow

1. **Crawl** a site:

playwright install

playwright-install.txt

Requires a browser (Playwright) for local crawling.

---

## Batch Processing

example.sh

# Crawl a site
python scripts/find_emails.py https://example.com

# Multiple URLs
python scripts/find_emails.py https://example.com https://other.com

# JSON output
python scripts/find_emails.py https://example.com -j

# Save to file
python scripts/find_emails.py https://example.com -o emails.txt

example.sh

python scripts/find_emails.py <url> [url ...]
python scripts/find_emails.py https://example.com
python scripts/find_emails.py https://example.com -j -o results.json
python scripts/find_emails.py --from-file page.md

example.txt

Found 3 unique email(s) across 2 domain(s)

## example.com

  • [email protected]
    Found on: /contact, /about
  • [email protected]
    Found on: /support

## other.com

  • [email protected]
    Found on: /contact-us