openclaw-semantic-search

Hybrid semantic + keyword search for OpenClaw memory files.

What it does

Searches .md files in an OpenClaw workspace using OpenAI embeddings combined with BM25 keyword scoring. Returns the most relevant chunks ranked by a hybrid score (0.7 vector similarity + 0.3 BM25).

Features

Markdown-aware chunking — splits on ## headers, respects paragraph boundaries, merges tiny sections
BM25 hybrid scoring — 70% cosine similarity from embeddings + 30% BM25 keyword relevance
Incremental caching — only re-embeds files that changed since last run
Configurable file discovery — indexes memory/, bank/, MEMORY.md, USER.md, IDENTITY.md

Requirements

Node.js 18+ (uses native fetch)
OpenAI API key

Installation

git clone https://github.com/auriwren/openclaw-semantic-search.git
cd openclaw-semantic-search
chmod +x semantic-search

Then either add the directory to your PATH or copy/symlink semantic-search into your OpenClaw tools/ directory.

Usage

# Basic search
semantic-search "what happened last week"

# Limit results
semantic-search "project deadlines" --limit 10

# JSON output (for programmatic use)
semantic-search "meeting notes" --json

Output

Human-readable mode shows each matching chunk with its source file, line number, and relevance score:

🔍 Search: "what happened last week"

━━━ memory/2026-02-07.md:15 (87%) ━━━
## Morning
Had a productive session working on the semantic search tool...

━━━ memory/2026-02-06.md:42 (74%) ━━━
## Evening
Wrapped up the week's tasks...

JSON mode (--json) returns structured data:

{
  "query": "what happened last week",
  "results": [
    {
      "source": "memory/2026-02-07.md",
      "lineStart": 15,
      "score": 0.872,
      "text": "## Morning\nHad a productive session..."
    }
  ]
}

Configuration

Environment Variable	Description	Default
`OPENAI_API_KEY`	Required. Your OpenAI API key	—
`OPENCLAW_WORKSPACE`	Path to the OpenClaw workspace to search	Current directory
`OPENCLAW_CACHE`	Directory for the embeddings cache file	`~/.openclaw/cache`

How it works

Chunking strategy

Files are split into chunks using markdown structure:

Split on #, ##, ### headers into sections
Sections under 100 characters are merged with the next section
Sections over 800 characters are split at paragraph boundaries (double newlines)
Header text is preserved as context in each sub-chunk

BM25 parameters

Standard BM25 with k1 = 1.2 and b = 0.75. Query and document text are tokenized into lowercase alphanumeric words. IDF uses the standard log formula. BM25 scores are normalized to 0-1 before combining.

Hybrid scoring

Each chunk gets two scores:

Cosine similarity (0-1) between the query embedding and chunk embedding using OpenAI text-embedding-3-small
BM25 score (normalized 0-1) for keyword relevance

Final score = 0.7 * cosine + 0.3 * BM25

Caching

Embeddings are cached in memory-embeddings.json in the cache directory. On each run, only files with changed modification times are re-embedded. Deleted files are automatically cleaned from the cache.

Integration with OpenClaw

This tool is designed to work as an OpenClaw custom tool. Place it in your workspace's tools/ directory and it will be available to your agent. The --json flag makes it easy to parse results programmatically.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
semantic-search		semantic-search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openclaw-semantic-search

What it does

Features

Requirements

Installation

Usage

Output

Configuration

How it works

Chunking strategy

BM25 parameters

Hybrid scoring

Caching

Integration with OpenClaw

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

openclaw-semantic-search

What it does

Features

Requirements

Installation

Usage

Output

Configuration

How it works

Chunking strategy

BM25 parameters

Hybrid scoring

Caching

Integration with OpenClaw

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages