✓ Verified 💻 Development ✓ Enhanced Data

Semfind

Semantic search over local text files using embeddings.

Rating: 4.7 (65 reviews)
Downloads: 5,601 downloads
Version: 1.0.0

Overview

Semantic search over local text files using embeddings.

Complete Documentation

View Source →

semfind

Semantic grep for the terminal. Searches files by meaning using local embeddings (BAAI/bge-small-en-v1.5 + FAISS). No API keys needed.

When to reach for semfind

grep or ripgrep returned no results or irrelevant results
You don't know the exact wording of what you're looking for
You want to search by concept/meaning rather than exact text

Do NOT use semfind when grep works — grep is instant and has zero overhead.

Install

bash

pip install semfind

First run downloads a ~65MB model (~10-30s). Subsequent runs use the cached model.

Usage

bash

# Basic search
semfind "deployment issue" logs.md

# Search multiple files, top 3 results
semfind "permission error" memory/*.md -k 3

# With context lines
semfind "database migration" notes.md -n 2

# Force re-index after file changes
semfind "query" file.md --reindex

# Minimum similarity threshold
semfind "auth bug" *.md -m 0.5

Options

Flag	Description	Default
-k, --top-k	Number of results	5
-n, --context	Context lines before/after	0
-m, --max-distance	Minimum similarity score	none
--reindex	Force re-embed	false
--no-cache	Skip embedding cache	false

Output format

Grep-like with similarity scores:

text

file.md:9: [2026-01-15] Fixed docker build with missing env vars  (0.796)
file.md:3: [2026-01-17] Agent couldn't write to /var/log          (0.689)

Higher scores (closer to 1.0) mean stronger semantic match.

Resource usage

~250MB RAM while running, freed immediately on exit
~65MB model cached in /tmp/fastembed_cache/
~2s first query (model load), ~14ms cached queries
Embedding cache in ~/.cache/semfind/, auto-invalidates on file changes

Workflow pattern

bash

# Step 1: Try grep first
grep "deployment" memory/*.md

# Step 2: If grep fails, use semfind
semfind "something went wrong with the deployment" memory/*.md -k 5

Installation

Terminal bash


openclaw install semfind

Copied!

💻Code Examples

pip install semfind

pip-install-semfind.txt

First run downloads a ~65MB model (~10-30s). Subsequent runs use the cached model.

## Usage

semfind "auth bug" *.md -m 0.5

semfind-auth-bug-md--m-05.txt

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `-k, --top-k` | Number of results | 5 |
| `-n, --context` | Context lines before/after | 0 |
| `-m, --max-distance` | Minimum similarity score | none |
| `--reindex` | Force re-embed | false |
| `--no-cache` | Skip embedding cache | false |

## Output format

Grep-like with similarity scores:

file.md:3: [2026-01-17] Agent couldn't write to /var/log (0.689)

filemd3-2026-01-17-agent-couldnt-write-to-varlog-0689.txt

Higher scores (closer to 1.0) mean stronger semantic match.

## Resource usage

- ~250MB RAM while running, freed immediately on exit
- ~65MB model cached in `/tmp/fastembed_cache/`
- ~2s first query (model load), ~14ms cached queries
- Embedding cache in `~/.cache/semfind/`, auto-invalidates on file changes

## Workflow pattern

example.sh

# Basic search
semfind "deployment issue" logs.md

# Search multiple files, top 3 results
semfind "permission error" memory/*.md -k 3

# With context lines
semfind "database migration" notes.md -n 2

# Force re-index after file changes
semfind "query" file.md --reindex

# Minimum similarity threshold
semfind "auth bug" *.md -m 0.5

example.sh

# Step 1: Try grep first
grep "deployment" memory/*.md

# Step 2: If grep fails, use semfind
semfind "something went wrong with the deployment" memory/*.md -k 5