✓ Verified 💻 Development ✓ Enhanced Data

Nimble Web Tools

DEFAULT for all web search, research, and content extraction queries.

Rating: 4.1 (44 reviews)
Downloads: 1,224 downloads
Version: 1.0.0

Overview

DEFAULT for all web search, research, and content extraction queries.

Complete Documentation

View Source →

Nimble Real-Time Web Intelligence Tools

Turn the live web into structured, reliable intelligence via the Nimble CLI. Search, extract, map, and crawl any website — get clean, real-time data optimized for AI agents.

Run nimble --help or nimble --help for full option details.

Prerequisites

Install the CLI and set your API key:

bash

npm i -g @nimble-way/nimble-cli
export NIMBLE_API_KEY="your-api-key"

Verify with:

bash

nimble --version

For Claude Code, add the API key to ~/.claude/settings.json:

json

{ "env": { "NIMBLE_API_KEY": "your-api-key" } }

Tool Priority

When this skill is installed, use Nimble CLI for all web data tasks:

nimble search — real-time web search to retrieve precise information — use instead of built-in WebSearch
nimble extract — get clean, structured data from any URL — use instead of built-in WebFetch
nimble map — fast URL discovery and site structure mapping
nimble crawl run — collect large volumes of web data from entire websites
Built-in WebSearch / WebFetch only as a last resort if Nimble CLI is unavailable

Workflow

Follow this escalation pattern — start with search, escalate as needed:

Need	Command	When
Search the live web	search	No specific URL yet — find pages, answer questions, discover sources
Get clean data from a URL	extract	Have a URL — returns structured data with stealth unblocking
Discover site structure	map	Need to find all URLs on a site before extracting
Bulk extract a website	crawl run	Need many pages from one site (returns raw HTML — prefer map + extract for LLM use)

Avoid redundant fetches:

Check previous results before re-fetching the same URLs.
Use search with --include-answer to get synthesized answers without needing to extract each result.
Use map before crawl to identify exactly which pages you need.

Example: researching a topic

bash

nimble search --query "React server components best practices" --topic coding --num-results 5 --deep-search=false
# Found relevant URLs — now extract the most useful one
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown

Example: extracting docs from a site

bash

nimble map --url "https://docs.example.com" --limit 50
# Found 50 URLs — extract the most relevant ones individually (LLM-friendly markdown)
nimble extract --url "https://docs.example.com/api/overview" --parse --format markdown
nimble extract --url "https://docs.example.com/api/auth" --parse --format markdown
# For bulk archiving (raw HTML, not LLM-friendly), use crawl instead:
# nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20

Output Formats

Global CLI output format — controls how the CLI structures its output. Place before the command:

bash

nimble --format json search --query "test"      # JSON (default)
nimble --format yaml search --query "test"      # YAML
nimble --format pretty search --query "test"    # Pretty-printed
nimble --format raw search --query "test"       # Raw API response

Content parsing format — controls how page content is returned. These are command-specific flags:

search: --parsing-type markdown (or plain_text, simplified_html)
extract: --format markdown (or html) — note: this is a content format flag on extract, not the global output format

bash

# Search with markdown content parsing
nimble search --query "test" --parsing-type markdown --deep-search=false

# Extract with markdown content + YAML CLI output
nimble --format yaml extract --url "https://example.com" --parse --format markdown

Use --transform with GJSON syntax to extract specific fields:

bash

nimble search --query "AI news" --transform "results.#.url"

Commands

search

Accurate, real-time web search with 8 focus modes. AI Agents search the live web to retrieve precise information. Run nimble search --help for all options.

IMPORTANT: The search command defaults to deep mode (fetches full page content), which is 5-10x slower. Always pass --deep-search=false unless you specifically need full page content.

Always explicitly set these parameters on every search call:

--deep-search=false: Pass this on every call for fast responses (1-3s vs 5-15s). Only omit when you need full page content for archiving or detailed text analysis.
--include-answer: Recommended on every research/exploration query. Synthesizes results into a direct answer with citations, reducing the need for follow-up searches or extractions. Only skip for URL-discovery-only queries where you just need links. Note: This is a premium feature (Enterprise plans). If the API returns a 402 or 403 when using this flag, retry the same query without --include-answer and continue — the search results are still valuable without the synthesized answer.
--topic: Match to query type — coding, news, academic, etc. Default is general. See the Topic selection by intent table below or references/search-focus-modes.md for guidance.
--num-results: Default 10 — balanced speed and coverage.

bash

# Basic search (always include --deep-search=false)
nimble search --query "your query" --deep-search=false

# Coding-focused search
nimble search --query "React hooks tutorial" --topic coding --deep-search=false

# News search with time filter
nimble search --query "AI developments" --topic news --time-range week --deep-search=false

# Search with AI-generated answer summary
nimble search --query "what is WebAssembly" --include-answer --deep-search=false

# Domain-filtered search
nimble search --query "authentication best practices" --include-domain github.com --include-domain stackoverflow.com --deep-search=false

# Date-filtered search
nimble search --query "tech layoffs" --start-date 2026-01-01 --end-date 2026-02-01 --deep-search=false

# Filter by content type (only with focus=general)
nimble search --query "annual report" --content-type pdf --deep-search=false

# Control number of results
nimble search --query "Python tutorials" --num-results 15 --deep-search=false

# Deep search — ONLY when you need full page content (5-15s, much slower)
nimble search --query "machine learning" --deep-search --num-results 5

Key options:

Flag	Description
--query	Search query string (required)
--deep-search=false	Always pass this. Disables full page content fetch for 5-10x faster responses
--deep-search	Enable full page content fetch (slow, 5-15s — only when needed)
--topic	Focus mode: general, coding, news, academic, shopping, social, geo, location
--num-results	Max results to return (default 10)
--include-answer	Generate AI answer summary from results
--include-domain	Only include results from these domains (repeatable, max 50)
--exclude-domain	Exclude results from these domains (repeatable, max 50)
--time-range	Recency filter: hour, day, week, month, year
--start-date	Filter results after this date (YYYY-MM-DD)
--end-date	Filter results before this date (YYYY-MM-DD)
--content-type	Filter by type: pdf, docx, xlsx, documents, spreadsheets, presentations
--parsing-type	Output format: markdown, plain_text, simplified_html
--country	Country code for localized results
--locale	Locale for language settings
--max-subagents	Max parallel subagents for shopping/social/geo modes (1-10, default 3)

Focus modes (quick reference — for detailed per-mode guidance, decision tree, and combination strategies, read references/search-focus-modes.md):

Mode	Best for
general	Broad web searches (default)
coding	Programming docs, code examples, technical content
news	Current events, breaking news, recent articles
academic	Research papers, scholarly articles, studies
shopping	Product searches, price comparisons, e-commerce
social	People research, LinkedIn/X/YouTube profiles, community discussions
geo	Geographic information, regional data
location	Local businesses, place-specific queries

Topic selection by intent (see references/search-focus-modes.md for full table):

Query Intent	Primary Topic	Secondary (parallel)
Research a person	social	general
Research a company	general	news
Find code/docs	coding	—
Current events	news	social
Find a product/price	shopping	—
Find a place/business	location	geo
Find research papers	academic	—

Performance tips:

With --deep-search=false (FAST): 1-3 seconds, returns titles + snippets + URLs — use this 95% of the time
Without the flag / --deep-search (SLOW): 5-15 seconds, returns full page content — only for archiving or full-text analysis
Use --include-answer for quick synthesized insights — works great with fast mode
Start with 5-10 results, increase only if needed

extract

Scalable data collection with stealth unblocking. Get clean, real-time HTML and structured data from any URL. Supports JS rendering, browser emulation, and geolocation. Run nimble extract --help for all options.

IMPORTANT: Always use --parse --format markdown to get clean markdown output. Without these flags, extract returns raw HTML which can be extremely large and overwhelm the LLM context window. The --format flag on extract controls the content type (not the CLI output format — see Output Formats above).

bash

# Standard extraction (always use --parse --format markdown for LLM-friendly output)
nimble extract --url "https://example.com/article" --parse --format markdown

# Render JavaScript (for SPAs, dynamic content)
nimble extract --url "https://example.com/app" --render --parse --format markdown

# Extract with geolocation (see content as if from a specific country)
nimble extract --url "https://example.com" --country US --city "New York" --parse --format markdown

# Handle cookie consent automatically
nimble extract --url "https://example.com" --consent-header --parse --format markdown

# Custom browser emulation
nimble extract --url "https://example.com" --browser chrome --device desktop --os windows --parse --format markdown

# Multiple content format preferences (API tries first, falls back to second)
nimble extract --url "https://example.com" --parse --format markdown --format html

Key options:

Flag	Description
--url	Target URL to extract (required)
--parse	Parse the response content (always use this)
--format	Content type preference: markdown, html (always use markdown for LLM-friendly output)
--render	Render JavaScript using a browser
--country	Country code for geolocation and proxy
--city	City for geolocation
--state	US state for geolocation (only when country=US)
--locale	Locale for language settings
--consent-header	Auto-handle cookie consent
--browser	Browser type to emulate
--device	Device type for emulation
--os	Operating system to emulate
--driver	Browser driver to use
--method	HTTP method (GET, POST, etc.)
--headers	Custom HTTP headers (key=value)
--cookies	Browser cookies
--referrer-type	Referrer policy
--http2	Use HTTP/2 protocol
--request-timeout	Timeout in milliseconds
--tag	User-defined tag for request tracking

map

Fast URL discovery and site structure mapping. Easily plan extraction workflows. Returns URL metadata only (URLs, titles, descriptions) — not page content. Use extract or crawl to get actual content from the discovered URLs. Run nimble map --help for all options.

bash

# Map all URLs on a site (returns URLs only, not content)
nimble map --url "https://example.com"

# Limit number of URLs returned
nimble map --url "https://docs.example.com" --limit 100

# Include subdomains
nimble map --url "https://example.com" --domain-filter subdomains

# Use sitemap for discovery
nimble map --url "https://example.com" --sitemap auto

Key options:

Flag	Description
--url	URL to map (required)
--limit	Max number of links to return
--domain-filter	Include subdomains in mapping
--sitemap	Use sitemap for URL discovery
--country	Country code for geolocation
--locale	Locale for language settings

crawl

Extract contents from entire websites in a single request. Collect large volumes of web data automatically. Crawl is async — you start a job, poll for completion, then retrieve the results. Run nimble crawl run --help for all options.

Crawl defaults:

Setting	Default	Notes
--sitemap	auto	Automatically uses sitemap if available
--max-discovery-depth	5	How deep the crawler follows links
--limit	No limit	Always set a limit to avoid crawling entire sites

Start a crawl:

bash

# Crawl a site section (always set --limit)
nimble crawl run --url "https://docs.example.com" --limit 50

# Crawl with path filtering
nimble crawl run --url "https://example.com" --include-path "/docs" --include-path "/api" --limit 100

# Exclude paths
nimble crawl run --url "https://example.com" --exclude-path "/blog" --exclude-path "/archive" --limit 50

# Control crawl depth
nimble crawl run --url "https://example.com" --max-discovery-depth 3 --limit 50

# Allow subdomains and external links
nimble crawl run --url "https://example.com" --allow-subdomains --allow-external-links --limit 50

# Crawl entire domain (not just child paths)
nimble crawl run --url "https://example.com/docs" --crawl-entire-domain --limit 100

# Named crawl for tracking
nimble crawl run --url "https://example.com" --name "docs-crawl-feb-2026" --limit 200

# Use sitemap for discovery
nimble crawl run --url "https://example.com" --sitemap auto --limit 50

Key options for crawl run:

Flag	Description
--url	URL to crawl (required)
--limit	Max pages to crawl (always set this)
--max-discovery-depth	Max depth based on discovery order (default 5)
--include-path	Regex patterns for URLs to include (repeatable)
--exclude-path	Regex patterns for URLs to exclude (repeatable)
--allow-subdomains	Follow links to subdomains
--allow-external-links	Follow links to external sites
--crawl-entire-domain	Follow sibling/parent URLs, not just child paths
--ignore-query-parameters	Don't re-scrape same path with different query params
--name	Name for the crawl job
--sitemap	Use sitemap for URL discovery (default auto)
--callback	Webhook for receiving results

Poll crawl status and retrieve results:

Crawl jobs run asynchronously. After starting a crawl, poll for completion, then retrieve content using individual task IDs (not the crawl ID):

bash

# 1. Start the crawl → returns a crawl_id
nimble crawl run --url "https://docs.example.com" --limit 5
# Returns: crawl_id "abc-123"

# 2. Poll status until completed → returns individual task_ids per page
nimble crawl status --id "abc-123"
# Returns: tasks: [{ task_id: "task-456" }, { task_id: "task-789" }, ...]
# Status values: running, completed, failed, terminated

# 3. Retrieve content using INDIVIDUAL task_ids (NOT the crawl_id)
nimble tasks results --task-id "task-456"
nimble tasks results --task-id "task-789"
# ⚠️ Using the crawl_id here returns 404 — you must use the per-page task_ids from step 2

IMPORTANT: nimble tasks results requires the individual task IDs from crawl status (each crawled page gets its own task ID), not the crawl job ID. Using the crawl ID will return a 404 error.

Polling guidelines:

Poll every 15-30 seconds for small crawls (< 50 pages)
Poll every 30-60 seconds for larger crawls (50+ pages)
Stop polling after status is completed, failed, or terminated
Note: crawl status may occasionally misreport individual task statuses (showing "failed" for tasks that actually succeeded). If crawl status shows failed tasks, try retrieving their results with nimble tasks results before assuming failure

List crawls:

bash

# List all crawls
nimble crawl list

# Filter by status
nimble crawl list --status running

# Paginate results
nimble crawl list --limit 10

Cancel a crawl:

bash

nimble crawl terminate --id "crawl-task-id"

Best Practices

Search Strategy

Always pass --deep-search=false — the default is deep mode (slow). Fast mode covers 95% of use cases: URL discovery, research, comparisons, answer generation
Only use deep mode when you need full page text — archiving articles, extracting complete docs, building datasets
Start with the right focus mode — match --topic to your query type (see references/search-focus-modes.md)
Use --include-answer — get AI-synthesized insights without extracting each result. If it returns 402/403, retry without it.
Filter domains — use --include-domain to target authoritative sources
Add time filters — use --time-range for time-sensitive queries

Multi-Search Strategy

When researching a topic in depth, run 2-3 searches in parallel with:

Different topics — e.g., social + general for people research
Different query angles — e.g., "Jane Doe current job" + "Jane Doe career history" + "Jane Doe publications"

This is faster than sequential searches and gives broader coverage. Deduplicate results by URL before extracting.

Disambiguating Common Names

When searching for a person with a common name:

Include distinguishing context in the query: company name, job title, city
Use --topic social — LinkedIn results include location and current company, making disambiguation easier
Cross-reference results across searches to confirm you're looking at the right person

Extraction Strategy

Always use --parse --format markdown — returns clean markdown instead of raw HTML, preventing context window overflow
Try without --render first — it's faster for static pages
Add --render for SPAs — when content is loaded by JavaScript
Set geolocation — use --country to see region-specific content

Crawl Strategy

Prefer map + extract over crawl for LLM use — crawl results return raw HTML (60-115KB per page) which overwhelms LLM context. For LLM-friendly output, use map to discover URLs, then extract --parse --format markdown on individual pages
Use crawl only for bulk archiving or data pipelines — when you need raw content from many pages and will post-process it outside the LLM context
Always set --limit — crawl has no default limit, so always specify one to avoid crawling entire sites
Use path filters — --include-path and --exclude-path to target specific sections
Name your crawls — use --name for easy tracking
Retrieve with individual task IDs — crawl status returns per-page task IDs; use those (not the crawl ID) with nimble tasks results --task-id

Common Recipes

Researching a person

bash

# Step 1: Run social + general in parallel for max coverage
nimble search --query "Jane Doe Head of Engineering" --topic social --deep-search=false --num-results 10 --include-answer
nimble search --query "Jane Doe Head of Engineering" --topic general --deep-search=false --num-results 10 --include-answer

# Step 2: Broaden with different query angles in parallel
nimble search --query "Jane Doe career history Acme Corp" --deep-search=false --include-answer
nimble search --query "Jane Doe publications blog articles" --deep-search=false --include-answer

# Step 3: Extract the most promising non-auth-walled URLs (skip LinkedIn — see Known Limitations)
nimble extract --url "https://www.companysite.com/team/jane-doe" --parse --format markdown

Researching a company

bash

# Step 1: Overview + recent news in parallel
nimble search --query "Acme Corp" --topic general --deep-search=false --include-answer
nimble search --query "Acme Corp" --topic news --time-range month --deep-search=false --include-answer

# Step 2: Extract company page
nimble extract --url "https://acme.com/about" --parse --format markdown

Technical research

bash

# Step 1: Find docs and code examples
nimble search --query "React Server Components migration guide" --topic coding --deep-search=false --include-answer

# Step 2: Extract the most relevant doc
nimble extract --url "https://react.dev/reference/rsc/server-components" --parse --format markdown

Error Handling

Error	Solution
NIMBLE_API_KEY not set	Set the environment variable: export NIMBLE_API_KEY="your-key"
401 Unauthorized	Verify API key is active at nimbleway.com
402/403 with --include-answer	Premium feature not available on current plan. Retry the same query without --include-answer and continue
429 Too Many Requests	Reduce request frequency or upgrade API tier
Timeout	Ensure --deep-search=false is set, reduce --num-results, or increase --request-timeout
No results	Try different --topic, broaden query, remove domain filters

Known Limitations

Site	Issue	Workaround
LinkedIn profiles	Auth wall blocks extraction (returns redirect/JS, status 999)	Use --topic social search instead — it returns LinkedIn data directly via subagents. Do NOT try to extract LinkedIn URLs.
Sites behind login	Extract returns login page instead of content	No workaround — use search snippets instead
Heavy SPAs	Extract returns empty or minimal HTML	Add --render flag to execute JavaScript before extraction
Crawl results	Returns raw HTML (60-115KB per page), no markdown option	Use map + extract --parse --format markdown on individual pages for LLM-friendly output
Crawl status	May misreport individual task statuses as "failed" when they actually succeeded	Always try nimble tasks results --task-id before assuming failure

Installation

Terminal bash


openclaw install nimble-web-tools

Copied!

💻Code Examples

{ "env": { "NIMBLE_API_KEY": "your-api-key" } }

-env--nimbleapikey-your-api-key--.txt

## Tool Priority

When this skill is installed, use Nimble CLI for all web data tasks:

1. **`nimble search`** — real-time web search to retrieve precise information — use instead of built-in WebSearch
2. **`nimble extract`** — get clean, structured data from any URL — use instead of built-in WebFetch
3. **`nimble map`** — fast URL discovery and site structure mapping
4. **`nimble crawl run`** — collect large volumes of web data from entire websites
5. Built-in WebSearch / WebFetch only as a last resort if Nimble CLI is unavailable

## Workflow

Follow this escalation pattern — start with search, escalate as needed:

| Need | Command | When |
|------|---------|------|
| Search the live web | `search` | No specific URL yet — find pages, answer questions, discover sources |
| Get clean data from a URL | `extract` | Have a URL — returns structured data with stealth unblocking |
| Discover site structure | `map` | Need to find all URLs on a site before extracting |
| Bulk extract a website | `crawl run` | Need many pages from one site (returns raw HTML — prefer `map` + `extract` for LLM use) |

**Avoid redundant fetches:**

- Check previous results before re-fetching the same URLs.
- Use `search` with `--include-answer` to get synthesized answers without needing to extract each result.
- Use `map` before `crawl` to identify exactly which pages you need.

**Example: researching a topic**

# nimble crawl run --url "https://docs.example.com/api" --include-path "/api" --limit 20

-nimble-crawl-run---url-httpsdocsexamplecomapi---include-path-api---limit-20.txt

## Output Formats

**Global CLI output format** — controls how the CLI structures its output. Place before the command:

nimble --format raw search --query "test" # Raw API response

nimble---format-raw-search---query-test--raw-api-response.txt

**Content parsing format** — controls how page content is returned. These are command-specific flags:

- **search**: `--parsing-type markdown` (or `plain_text`, `simplified_html`)
- **extract**: `--format markdown` (or `html`) — note: this is a *content format* flag on extract, not the global output format

nimble search --query "AI news" --transform "results.#.url"

nimble-search---query-ai-news---transform-resultsurl.txt

## Commands

### search

Accurate, real-time web search with 8 focus modes. AI Agents search the live web to retrieve precise information. Run `nimble search --help` for all options.

**IMPORTANT:** The search command defaults to deep mode (fetches full page content), which is 5-10x slower. Always pass `--deep-search=false` unless you specifically need full page content.

Always explicitly set these parameters on every search call:

- `--deep-search=false`: **Pass this on every call** for fast responses (1-3s vs 5-15s). Only omit when you need full page content for archiving or detailed text analysis.
- `--include-answer`: **Recommended on every research/exploration query.** Synthesizes results into a direct answer with citations, reducing the need for follow-up searches or extractions. Only skip for URL-discovery-only queries where you just need links. **Note:** This is a premium feature (Enterprise plans). If the API returns a `402` or `403` when using this flag, retry the same query without `--include-answer` and continue — the search results are still valuable without the synthesized answer.
- `--topic`: Match to query type — `coding`, `news`, `academic`, etc. Default is `general`. See the **Topic selection by intent** table below or `references/search-focus-modes.md` for guidance.
- `--num-results`: Default `10` — balanced speed and coverage.

nimble search --query "machine learning" --deep-search --num-results 5

nimble-search---query-machine-learning---deep-search---num-results-5.txt

**Key options:**

| Flag | Description |
|------|-------------|
| `--query` | Search query string (required) |
| `--deep-search=false` | **Always pass this.** Disables full page content fetch for 5-10x faster responses |
| `--deep-search` | Enable full page content fetch (slow, 5-15s — only when needed) |
| `--topic` | Focus mode: general, coding, news, academic, shopping, social, geo, location |
| `--num-results` | Max results to return (default 10) |
| `--include-answer` | Generate AI answer summary from results |
| `--include-domain` | Only include results from these domains (repeatable, max 50) |
| `--exclude-domain` | Exclude results from these domains (repeatable, max 50) |
| `--time-range` | Recency filter: hour, day, week, month, year |
| `--start-date` | Filter results after this date (YYYY-MM-DD) |
| `--end-date` | Filter results before this date (YYYY-MM-DD) |
| `--content-type` | Filter by type: pdf, docx, xlsx, documents, spreadsheets, presentations |
| `--parsing-type` | Output format: markdown, plain_text, simplified_html |
| `--country` | Country code for localized results |
| `--locale` | Locale for language settings |
| `--max-subagents` | Max parallel subagents for shopping/social/geo modes (1-10, default 3) |

**Focus modes** (quick reference — for detailed per-mode guidance, decision tree, and combination strategies, **read `references/search-focus-modes.md`**):

| Mode | Best for |
|------|----------|
| `general` | Broad web searches (default) |
| `coding` | Programming docs, code examples, technical content |
| `news` | Current events, breaking news, recent articles |
| `academic` | Research papers, scholarly articles, studies |
| `shopping` | Product searches, price comparisons, e-commerce |
| `social` | People research, LinkedIn/X/YouTube profiles, community discussions |
| `geo` | Geographic information, regional data |
| `location` | Local businesses, place-specific queries |

**Topic selection by intent** (see `references/search-focus-modes.md` for full table):

| Query Intent | Primary Topic | Secondary (parallel) |
|---|---|---|
| Research a **person** | `social` | `general` |
| Research a **company** | `general` | `news` |
| Find **code/docs** | `coding` | — |
| Current **events** | `news` | `social` |
| Find a **product/price** | `shopping` | — |
| Find a **place/business** | `location` | `geo` |
| Find **research papers** | `academic` | — |

**Performance tips:**

- With `--deep-search=false` (FAST): 1-3 seconds, returns titles + snippets + URLs — use this 95% of the time
- Without the flag / `--deep-search` (SLOW): 5-15 seconds, returns full page content — only for archiving or full-text analysis
- Use `--include-answer` for quick synthesized insights — works great with fast mode
- Start with 5-10 results, increase only if needed

### extract

Scalable data collection with stealth unblocking. Get clean, real-time HTML and structured data from any URL. Supports JS rendering, browser emulation, and geolocation. Run `nimble extract --help` for all options.

**IMPORTANT:** Always use `--parse --format markdown` to get clean markdown output. Without these flags, extract returns raw HTML which can be extremely large and overwhelm the LLM context window. The `--format` flag on extract controls the *content type* (not the CLI output format — see Output Formats above).

nimble extract --url "https://example.com" --parse --format markdown --format html

nimble-extract---url-httpsexamplecom---parse---format-markdown---format-html.txt

**Key options:**

| Flag | Description |
|------|-------------|
| `--url` | Target URL to extract (required) |
| `--parse` | Parse the response content (always use this) |
| `--format` | Content type preference: `markdown`, `html` (always use `markdown` for LLM-friendly output) |
| `--render` | Render JavaScript using a browser |
| `--country` | Country code for geolocation and proxy |
| `--city` | City for geolocation |
| `--state` | US state for geolocation (only when country=US) |
| `--locale` | Locale for language settings |
| `--consent-header` | Auto-handle cookie consent |
| `--browser` | Browser type to emulate |
| `--device` | Device type for emulation |
| `--os` | Operating system to emulate |
| `--driver` | Browser driver to use |
| `--method` | HTTP method (GET, POST, etc.) |
| `--headers` | Custom HTTP headers (key=value) |
| `--cookies` | Browser cookies |
| `--referrer-type` | Referrer policy |
| `--http2` | Use HTTP/2 protocol |
| `--request-timeout` | Timeout in milliseconds |
| `--tag` | User-defined tag for request tracking |

### map

Fast URL discovery and site structure mapping. Easily plan extraction workflows. Returns **URL metadata only** (URLs, titles, descriptions) — not page content. Use `extract` or `crawl` to get actual content from the discovered URLs. Run `nimble map --help` for all options.

nimble map --url "https://example.com" --sitemap auto

nimble-map---url-httpsexamplecom---sitemap-auto.txt

**Key options:**

| Flag | Description |
|------|-------------|
| `--url` | URL to map (required) |
| `--limit` | Max number of links to return |
| `--domain-filter` | Include subdomains in mapping |
| `--sitemap` | Use sitemap for URL discovery |
| `--country` | Country code for geolocation |
| `--locale` | Locale for language settings |

### crawl

Extract contents from entire websites in a single request. Collect large volumes of web data automatically. Crawl is **async** — you start a job, poll for completion, then retrieve the results. Run `nimble crawl run --help` for all options.

**Crawl defaults:**

| Setting | Default | Notes |
|---------|---------|-------|
| `--sitemap` | `auto` | Automatically uses sitemap if available |
| `--max-discovery-depth` | `5` | How deep the crawler follows links |
| `--limit` | No limit | **Always set a limit** to avoid crawling entire sites |

**Start a crawl:**

nimble crawl run --url "https://example.com" --sitemap auto --limit 50

nimble-crawl-run---url-httpsexamplecom---sitemap-auto---limit-50.txt

**Key options for `crawl run`:**

| Flag | Description |
|------|-------------|
| `--url` | URL to crawl (required) |
| `--limit` | Max pages to crawl (**always set this**) |
| `--max-discovery-depth` | Max depth based on discovery order (default 5) |
| `--include-path` | Regex patterns for URLs to include (repeatable) |
| `--exclude-path` | Regex patterns for URLs to exclude (repeatable) |
| `--allow-subdomains` | Follow links to subdomains |
| `--allow-external-links` | Follow links to external sites |
| `--crawl-entire-domain` | Follow sibling/parent URLs, not just child paths |
| `--ignore-query-parameters` | Don't re-scrape same path with different query params |
| `--name` | Name for the crawl job |
| `--sitemap` | Use sitemap for URL discovery (default auto) |
| `--callback` | Webhook for receiving results |

**Poll crawl status and retrieve results:**

Crawl jobs run asynchronously. After starting a crawl, poll for completion, then retrieve content using **individual task IDs** (not the crawl ID):

# ⚠️ Using the crawl_id here returns 404 — you must use the per-page task_ids from step 2

--using-the-crawlid-here-returns-404--you-must-use-the-per-page-taskids-from-step-2.txt

**IMPORTANT:** `nimble tasks results` requires the **individual task IDs** from `crawl status` (each crawled page gets its own task ID), not the crawl job ID. Using the crawl ID will return a 404 error.

**Polling guidelines:**
- Poll every **15-30 seconds** for small crawls (< 50 pages)
- Poll every **30-60 seconds** for larger crawls (50+ pages)
- Stop polling after status is `completed`, `failed`, or `terminated`
- **Note:** `crawl status` may occasionally misreport individual task statuses (showing "failed" for tasks that actually succeeded). If `crawl status` shows failed tasks, try retrieving their results with `nimble tasks results` before assuming failure

**List crawls:**

nimble crawl terminate --id "crawl-task-id"

nimble-crawl-terminate---id-crawl-task-id.txt

## Best Practices

### Search Strategy

1. **Always pass `--deep-search=false`** — the default is deep mode (slow). Fast mode covers 95% of use cases: URL discovery, research, comparisons, answer generation
2. **Only use deep mode when you need full page text** — archiving articles, extracting complete docs, building datasets
3. **Start with the right focus mode** — match `--topic` to your query type (see `references/search-focus-modes.md`)
4. **Use `--include-answer`** — get AI-synthesized insights without extracting each result. If it returns 402/403, retry without it.
5. **Filter domains** — use `--include-domain` to target authoritative sources
6. **Add time filters** — use `--time-range` for time-sensitive queries

### Multi-Search Strategy

When researching a topic in depth, run 2-3 searches in parallel with:
- **Different topics** — e.g., `social` + `general` for people research
- **Different query angles** — e.g., "Jane Doe current job" + "Jane Doe career history" + "Jane Doe publications"

This is faster than sequential searches and gives broader coverage. Deduplicate results by URL before extracting.

### Disambiguating Common Names

When searching for a person with a common name:
1. Include distinguishing context in the query: company name, job title, city
2. Use `--topic social` — LinkedIn results include location and current company, making disambiguation easier
3. Cross-reference results across searches to confirm you're looking at the right person

### Extraction Strategy

1. **Always use `--parse --format markdown`** — returns clean markdown instead of raw HTML, preventing context window overflow
2. **Try without `--render` first** — it's faster for static pages
3. **Add `--render` for SPAs** — when content is loaded by JavaScript
4. **Set geolocation** — use `--country` to see region-specific content

### Crawl Strategy

1. **Prefer `map` + `extract` over `crawl` for LLM use** — crawl results return raw HTML (60-115KB per page) which overwhelms LLM context. For LLM-friendly output, use `map` to discover URLs, then `extract --parse --format markdown` on individual pages
2. **Use `crawl` only for bulk archiving or data pipelines** — when you need raw content from many pages and will post-process it outside the LLM context
3. **Always set `--limit`** — crawl has no default limit, so always specify one to avoid crawling entire sites
4. **Use path filters** — `--include-path` and `--exclude-path` to target specific sections
5. **Name your crawls** — use `--name` for easy tracking
6. **Retrieve with individual task IDs** — `crawl status` returns per-page task IDs; use those (not the crawl ID) with `nimble tasks results --task-id`

## Common Recipes

### Researching a person