✓ Verified 💻 Development ✓ Enhanced Data

Curated Search

Domain-restricted full-text search over curated technical documentation.

Rating: 4.7 (404 reviews)
Downloads: 12,562 downloads
Version: 1.0.0

Overview

Domain-restricted full-text search over curated technical documentation.

Complete Documentation

View Source →

Curated Search Skill

Summary

Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.

External Endpoints

This skill does not call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (npm run crawl) and respect the configured domain whitelist.

Security & Privacy

Search is fully local – After the index is built, all queries run offline; no data leaves your machine.
Crawling is optional and whitelist‑scoped – The crawler only accesses domains you explicitly list in config.yaml. It respects robots.txt and configurable delays.
No telemetry – No usage data is transmitted externally.
Configuration is read from local config.yaml and the index file in data/.

Model Invocation Note

The curated-search.search tool is invoked only when the user explicitly calls it. It does not run autonomously. OpenClaw calls the tool handler (scripts/search.js) when the user asks to search the curated index.

Trust Statement

By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.

Tool: curated-search.search

Search the curated index.

Parameters

Name	Type	Required	Default	Description
query	string	yes	—	Search query terms
limit	number	no	5	Maximum results (capped by config.max_limit, typically 100)
domain	string	no	null	Filter to specific domain (e.g., docs.python.org)
min_score	number	no	0.0	Minimum relevance score (0.0–1.0); filters out low-quality matches
offset	number	no	0	Pagination offset (skip first N results)

Response

JSON array of result objects:

json

[
  {
    "title": "Python Tutorial",
    "url": "https://docs.python.org/3/tutorial/",
    "snippet": "Python is an easy to learn, powerful programming language...",
    "domain": "docs.python.org",
    "score": 0.87,
    "crawled_at": 1707712345678
  }
]

Fields:

title — Document title (cleaned)
url — Source URL (canonical)
snippet — Excerpt (~200 chars) from content
domain — Hostname of source
score — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
crawled_at — Unix timestamp when page was crawled

Example Agent Calls

text

search CuratedSearch for "python tutorial"
search CuratedSearch for "async await" limit=3 domain=developer.mozilla.org
search CuratedSearch for "linux man page" min_score=0.3

Errors

If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:

json

{
  "error": "index_not_found",
  "message": "Search index not found. The index has not been built yet.",
  "suggestion": "Run the crawler first: npm run crawl",
  "details": { "path": "data/index.json" }
}

Common error codes:

Code	Meaning	Suggested Fix
config_missing	Configuration file not found	Specify --config path or ensure config.yaml exists
config_invalid	YAML parsing failed	Check syntax in config.yaml
config_missing_index_path	index.path not set	Add index.path to config
index_not_found	Index file missing	Run npm run crawl to build index
index_corrupted	Index file incompatible or corrupted	Rebuild index with npm run crawl
index_init_failed	Unexpected index initialization error	Check permissions, reinstall dependencies
missing_query	No query provided	Provide --query argument
query_too_long	Query exceeds 1000 characters	Shorten the query
limit_exceeded	Limit > config.max_limit	Use a smaller limit
invalid_domain	Domain filter malformed	Use format like docs.python.org
conflicting_flags	Mutually exclusive flags used (e.g., --stats with --query)	Use flags correctly
stats_failed	Could not retrieve index stats	Ensure index is accessible
search_failed	Search execution threw an error	Check query and index integrity

Configuration

Edit config.yaml in the skill directory. Key sections:

domains — whitelist of allowed domains (required)
seeds — starting URLs for crawling
crawl — depth, delay, timeout, max_documents
content — min_content_length, max_content_length
index — path to index files
search — default_limit, max_limit, min_score

See README.md for full configuration docs.

Support

Full documentation: README.md
Technical specs: specs/
Build plan: PLAN.md
Contributor guide: CONTRIBUTING.md
Issues: Report on GitHub (or via OpenClaw maintainers)

Installation

Terminal bash


openclaw install curated-search

Copied!

💻Code Examples

]

.txt

**Fields:**
- `title` — Document title (cleaned)
- `url` — Source URL (canonical)
- `snippet` — Excerpt (~200 chars) from content
- `domain` — Hostname of source
- `score` — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
- `crawled_at` — Unix timestamp when page was crawled

### Example Agent Calls

search CuratedSearch for "linux man page" min_score=0.3

search-curatedsearch-for-linux-man-page-minscore03.txt

### Errors

If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:

example.json

[
  {
    "title": "Python Tutorial",
    "url": "https://docs.python.org/3/tutorial/",
    "snippet": "Python is an easy to learn, powerful programming language...",
    "domain": "docs.python.org",
    "score": 0.87,
    "crawled_at": 1707712345678
  }
]

example.txt

search CuratedSearch for "python tutorial"
search CuratedSearch for "async await" limit=3 domain=developer.mozilla.org
search CuratedSearch for "linux man page" min_score=0.3

example.json

{
  "error": "index_not_found",
  "message": "Search index not found. The index has not been built yet.",
  "suggestion": "Run the crawler first: npm run crawl",
  "details": { "path": "data/index.json" }
}