✓ Verified 💻 Development ✓ Enhanced Data

Anydocs

Generic Documentation Indexing & Search.

Rating
4.1 (321 reviews)
Downloads
8,022 downloads
Version
1.0.0

Overview

Generic Documentation Indexing & Search.

Key Features

1

Index any documentation site (Discord, OpenClaw, internal docs, etc.)

2

Search instantly from the command line or Python API

3

Cache pages locally to avoid repeated network calls

4

Configure multiple profiles for different doc sites

Complete Documentation

View Source →

anydocs - Generic Documentation Indexing & Search

A powerful, reusable skill for indexing and searching ANY documentation site.

What It Does

anydocs solves a real problem: accessing documentation from code or CLI. Instead of opening a browser every time, you can:

  • Index any documentation site (Discord, OpenClaw, internal docs, etc.)
  • Search instantly from the command line or Python API
  • Cache pages locally to avoid repeated network calls
  • Configure multiple profiles for different doc sites

When to Use It

Use anydocs when you need to:

  • Quickly look up API documentation without leaving the terminal
  • Build agents that need to reference docs
  • Extract specific information from documentation
  • Search across multiple documentation sites
  • Integrate docs into your workflow

Key Features

🔍 Multi-Method Search

  • Keyword search: Fast, term-based matching with BM25-style scoring
  • Hybrid search: Keyword + phrase proximity for better relevance
  • Regex search: Advanced pattern matching for power users

🌐 Works with Any Docs Site

  • Sitemap-based discovery (standard XML sitemap)
  • Fallback crawling from base URL
  • HTML content extraction with smart selector detection
  • Automatic rate limiting to be respectful

💾 Smart Caching

  • Pages cached locally with 7-day TTL (configurable)
  • Search indexes cached for instant second searches
  • Cache statistics and cleanup commands
  • Respects cache invalidation

⚙️ Profile-Based Configuration

  • Support multiple doc sites simultaneously
  • Per-profile search methods and cache TTLs
  • Configuration stored in ~/.anydocs/config.json
  • Examples for Discord, OpenClaw, and custom sites

🌐 JavaScript Rendering (Optional)

  • Uses Playwright to render client-side SPAs (Single Page Apps)
  • Automatically discovers links on JS-heavy sites like Discord docs
  • Gracefully falls back to standard HTTP if Playwright unavailable
  • Configure per-discovery session or globally per profile

Installation

bash
cd /path/to/skills/anydocs
pip install -r requirements.txt
chmod +x anydocs.py

Optional: Browser-based rendering (for JavaScript-heavy sites)

For sites like Discord that use client-side rendering, install Playwright:

bash
pip install playwright==1.40.0
playwright install  # Downloads Chromium

If Playwright is unavailable, anydocs gracefully falls back to standard HTTP fetching.

Quick Start

1. Configure a Documentation Site

bash
python anydocs.py config vuejs \
  https://vuejs.org \
  https://vuejs.org/sitemap.xml

2. Build the Index

bash
python anydocs.py index vuejs

This discovers all pages via sitemap, scrapes content, and builds a searchable index.

3. Search

bash
python anydocs.py search "composition api" --profile vuejs
python anydocs.py search "reactivity" --profile vuejs --limit 5

4. Fetch a Specific Page

bash
python anydocs.py fetch "guide/introduction" --profile vuejs

CLI Commands

Configuration

bash
# Add or update a profile
anydocs config <profile> <base_url> <sitemap_url> [--search-method hybrid] [--ttl-days 7]

# List configured profiles
anydocs list-profiles

Indexing

bash
# Build index for a profile
anydocs index <profile>

# Force re-index (skip cache)
anydocs index <profile> --force

Search

bash
# Basic keyword search
anydocs search "query" --profile discord

# Limit results
anydocs search "query" --profile discord --limit 5

# Regex search
anydocs search "^API" --profile discord --regex

Fetch

bash
# Fetch a specific page (URL or path)
anydocs fetch "https://discord.com/developers/docs/resources/webhook"
anydocs fetch "resources/webhook" --profile discord

Cache Management

bash
# Show cache statistics
anydocs cache status

# Clear all cache
anydocs cache clear

# Clear specific profile's cache
anydocs cache clear --profile discord

Python API

For use in agents and scripts:

python
from lib.config import ConfigManager
from lib.scraper import DiscoveryEngine
from lib.indexer import SearchIndex

# Load configuration
config_mgr = ConfigManager()
config = config_mgr.get_profile("discord")

# Scrape documentation
scraper = DiscoveryEngine(config["base_url"], config["sitemap_url"])
pages = scraper.fetch_all()

# Build search index
index = SearchIndex()
index.build(pages)

# Search
results = index.search("webhooks", limit=10)
for result in results:
    print(f"{result['title']} ({result['relevance_score']})")
    print(f"  {result['url']}")

Configuration File Format

Configuration is stored in ~/.anydocs/config.json:

json
{
  "discord": {
    "name": "discord",
    "base_url": "https://discord.com/developers/docs",
    "sitemap_url": "https://discord.com/developers/docs/sitemap.xml",
    "search_method": "hybrid",
    "cache_ttl_days": 7
  },
  "openclaw": {
    "name": "openclaw",
    "base_url": "https://docs.openclaw.ai",
    "sitemap_url": "https://docs.openclaw.ai/sitemap.xml",
    "search_method": "hybrid",
    "cache_ttl_days": 7
  }
}

Search Methods

Keyword Search

  • Speed: Fast
  • Best for: Common terms, exact matches
  • How it works: Term matching with position weighting (title > tags > content)
  • Example: anydocs search "webhooks"

Hybrid Search (Default)

  • Speed: Fast
  • Best for: Natural language queries
  • How it works: Keyword search + phrase proximity scoring
  • Example: anydocs search "how to set up webhooks"

Regex Search

  • Speed: Medium
  • Best for: Complex patterns
  • How it works: Compiled regex pattern matching across all content
  • Example: anydocs search "^(GET|POST)" --regex

Caching Behavior

  • Pages: Cached as JSON with 7-day TTL (configurable)
  • Indexes: Cached after indexing, invalidated on TTL expiry
  • Cache location: ~/.anydocs/cache/
  • Manual refresh: Use --force flag or clear cache

Performance Notes

  • First index build takes 2-10 minutes depending on site size
  • Subsequent searches are instant (cached indexes)
  • Rate limit: 0.5s per page to be respectful
  • Typical search returns ~100 results in <100ms

Troubleshooting

"No index for 'profile'" error

Run anydocs index first to build the index.

Sitemap not found

Check the sitemap URL. Falls back to crawling from base_url if unavailable.

Slow indexing

This is normal for large sites. Rate limiting prevents overwhelming servers.

Cache grows too large

Run anydocs cache clear or set --ttl-days to a smaller value.

Examples

Vue.js Framework Docs (SPA Example)

bash
anydocs config vuejs \
  https://vuejs.org \
  https://vuejs.org/sitemap.xml
anydocs index vuejs
anydocs search "composition api"

Next.js API Docs

bash
anydocs config nextjs \
  https://nextjs.org \
  https://nextjs.org/sitemap.xml
anydocs index nextjs
anydocs search "app router" --profile nextjs

Internal Company Documentation

bash
anydocs config internal \
  https://docs.company.local \
  https://docs.company.local/sitemap.xml
anydocs index internal --force
anydocs search "deployment" --profile internal

Architecture

  • scraper.py: Discovers URLs via sitemap, fetches and parses HTML
  • indexer.py: Builds searchable indexes, implements multiple search strategies
  • config.py: Manages configuration profiles
  • cache.py: TTL-based file caching for pages and indexes
  • cli.py: Click-based command-line interface

Contributing

To add new documentation sites, run:

bash
anydocs config <profile> <base_url> <sitemap_url>

To extend search functionality, modify lib/indexer.py.

License

Part of the OpenClaw system.

Installation

Terminal bash

openclaw install anydocs
    
Copied!

💻Code Examples

chmod +x anydocs.py

chmod-x-anydocspy.txt
### Optional: Browser-based rendering (for JavaScript-heavy sites)

For sites like Discord that use client-side rendering, install Playwright:

playwright install # Downloads Chromium

playwright-install--downloads-chromium.txt
If Playwright is unavailable, anydocs gracefully falls back to standard HTTP fetching.

## Quick Start

### 1. Configure a Documentation Site

python anydocs.py index vuejs

python-anydocspy-index-vuejs.txt
This discovers all pages via sitemap, scrapes content, and builds a searchable index.

### 3. Search

python anydocs.py fetch "guide/introduction" --profile vuejs

python-anydocspy-fetch-guideintroduction---profile-vuejs.txt
## CLI Commands

### Configuration

anydocs cache clear --profile discord

anydocs-cache-clear---profile-discord.txt
## Python API

For use in agents and scripts:

print(f" {result['url']}")

-printf-resulturl.txt
## Configuration File Format

Configuration is stored in `~/.anydocs/config.json`:

}

.txt
## Search Methods

### Keyword Search
- **Speed**: Fast
- **Best for**: Common terms, exact matches
- **How it works**: Term matching with position weighting (title > tags > content)
- **Example**: `anydocs search "webhooks"`

### Hybrid Search (Default)
- **Speed**: Fast
- **Best for**: Natural language queries
- **How it works**: Keyword search + phrase proximity scoring
- **Example**: `anydocs search "how to set up webhooks"`

### Regex Search
- **Speed**: Medium
- **Best for**: Complex patterns
- **How it works**: Compiled regex pattern matching across all content
- **Example**: `anydocs search "^(GET|POST)" --regex`

## Caching Behavior

- **Pages**: Cached as JSON with 7-day TTL (configurable)
- **Indexes**: Cached after indexing, invalidated on TTL expiry
- **Cache location**: `~/.anydocs/cache/`
- **Manual refresh**: Use `--force` flag or clear cache

## Performance Notes

- First index build takes 2-10 minutes depending on site size
- Subsequent searches are instant (cached indexes)
- Rate limit: 0.5s per page to be respectful
- Typical search returns ~100 results in <100ms

## Troubleshooting

### "No index for 'profile'" error
Run `anydocs index <profile>` first to build the index.

### Sitemap not found
Check the sitemap URL. Falls back to crawling from base_url if unavailable.

### Slow indexing
This is normal for large sites. Rate limiting prevents overwhelming servers.

### Cache grows too large
Run `anydocs cache clear` or set `--ttl-days` to a smaller value.

## Examples

### Vue.js Framework Docs (SPA Example)

anydocs search "deployment" --profile internal

anydocs-search-deployment---profile-internal.txt
## Architecture

- **scraper.py**: Discovers URLs via sitemap, fetches and parses HTML
- **indexer.py**: Builds searchable indexes, implements multiple search strategies
- **config.py**: Manages configuration profiles
- **cache.py**: TTL-based file caching for pages and indexes
- **cli.py**: Click-based command-line interface

## Contributing

To add new documentation sites, run:
example.sh
cd /path/to/skills/anydocs
pip install -r requirements.txt
chmod +x anydocs.py
example.sh
python anydocs.py config vuejs \
  https://vuejs.org \
  https://vuejs.org/sitemap.xml

Tags

#web_and-frontend-development

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author pektech
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install anydocs