Semfind
Semantic search over local text files using embeddings.
- Rating
- 4.7 (65 reviews)
- Downloads
- 5,601 downloads
- Version
- 1.0.0
Overview
Semantic search over local text files using embeddings.
Complete Documentation
View Source →
semfind
Semantic grep for the terminal. Searches files by meaning using local embeddings (BAAI/bge-small-en-v1.5 + FAISS). No API keys needed.
When to reach for semfind
greporripgrepreturned no results or irrelevant results- You don't know the exact wording of what you're looking for
- You want to search by concept/meaning rather than exact text
Install
pip install semfind
First run downloads a ~65MB model (~10-30s). Subsequent runs use the cached model.
Usage
# Basic search
semfind "deployment issue" logs.md
# Search multiple files, top 3 results
semfind "permission error" memory/*.md -k 3
# With context lines
semfind "database migration" notes.md -n 2
# Force re-index after file changes
semfind "query" file.md --reindex
# Minimum similarity threshold
semfind "auth bug" *.md -m 0.5
Options
| Flag | Description | Default |
|---|---|---|
| -k, --top-k | Number of results | 5 |
| -n, --context | Context lines before/after | 0 |
| -m, --max-distance | Minimum similarity score | none |
| --reindex | Force re-embed | false |
| --no-cache | Skip embedding cache | false |
Output format
Grep-like with similarity scores:
file.md:9: [2026-01-15] Fixed docker build with missing env vars (0.796)
file.md:3: [2026-01-17] Agent couldn't write to /var/log (0.689)
Higher scores (closer to 1.0) mean stronger semantic match.
Resource usage
- ~250MB RAM while running, freed immediately on exit
- ~65MB model cached in
/tmp/fastembed_cache/ - ~2s first query (model load), ~14ms cached queries
- Embedding cache in
~/.cache/semfind/, auto-invalidates on file changes
Workflow pattern
# Step 1: Try grep first
grep "deployment" memory/*.md
# Step 2: If grep fails, use semfind
semfind "something went wrong with the deployment" memory/*.md -k 5
Installation
openclaw install semfind
💻Code Examples
pip install semfind
First run downloads a ~65MB model (~10-30s). Subsequent runs use the cached model.
## Usagesemfind "auth bug" *.md -m 0.5
## Options
| Flag | Description | Default |
|------|-------------|---------|
| `-k, --top-k` | Number of results | 5 |
| `-n, --context` | Context lines before/after | 0 |
| `-m, --max-distance` | Minimum similarity score | none |
| `--reindex` | Force re-embed | false |
| `--no-cache` | Skip embedding cache | false |
## Output format
Grep-like with similarity scores:file.md:3: [2026-01-17] Agent couldn't write to /var/log (0.689)
Higher scores (closer to 1.0) mean stronger semantic match.
## Resource usage
- ~250MB RAM while running, freed immediately on exit
- ~65MB model cached in `/tmp/fastembed_cache/`
- ~2s first query (model load), ~14ms cached queries
- Embedding cache in `~/.cache/semfind/`, auto-invalidates on file changes
## Workflow pattern# Basic search
semfind "deployment issue" logs.md
# Search multiple files, top 3 results
semfind "permission error" memory/*.md -k 3
# With context lines
semfind "database migration" notes.md -n 2
# Force re-index after file changes
semfind "query" file.md --reindex
# Minimum similarity threshold
semfind "auth bug" *.md -m 0.5# Step 1: Try grep first
grep "deployment" memory/*.md
# Step 2: If grep fails, use semfind
semfind "something went wrong with the deployment" memory/*.md -k 5Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.