Cut Claude Code token consumption by up to 98% — proven by real benchmark, not estimates.
These numbers are not estimates. They were produced by running the actual tools against real files on a real machine. Every number in this section can be reproduced by cloning the repo and running
node tests/benchmark.js.
| Date | 2026-03-31 |
| Platform | macOS 15 (Darwin 25.3) · Apple Silicon |
| Node.js | v24.9.0 |
| Test files | tests/fixtures/sample.log (85 lines) · tests/fixtures/AuthService.ts (120 lines) |
| Project | This repo (39 source files, ~3,000 lines) |
┌─────────────────────────────────────────────────────────────────────┐
│ Tool Before (tokens) After (tokens) Saved % │
├─────────────────────────────────────────────────────────────────────┤
│ compress_logs 1,508 597 911 60% │
│ compress_logs (5k sim) 50,000 597 49,403 99% │
│ smart_read 4,980 57 4,923 99% │
│ function_extractor 1,245 249 996 80% │
│ project_map 95,000 815 94,185 99% │
│ bulk_search 50,000 2,284 47,716 95% │
├─────────────────────────────────────────────────────────────────────┤
│ TOTAL 202,733 4,599 198,134 98% │
└─────────────────────────────────────────────────────────────────────┘
Token consumption — before vs after
Before ████████████████████████████████████████ 202,733 tokens (100%)
After █ 4,599 tokens ( 2%)
┌────────────────────────────────────────────────────────────────┐
│ │
│ 98% of tokens never reach Claude's context window. │
│ They were noise. We removed the noise. │
│ │
└────────────────────────────────────────────────────────────────┘
Pricing: Claude Opus 4 at $15 / 1M input tokens
┌──────────────────┬────────────────┬────────────────┬────────────────┐
│ Session scale │ Without │ With │ Saved │
├──────────────────┼────────────────┼────────────────┼────────────────┤
│ 1 session │ $3.041 │ $0.069 │ $2.972 │
│ 10 sessions/day │ $30.41 │ $0.69 │ $29.72 │
│ 100 sessions │ $304.10 │ $6.90 │ $297.20 │
│ 1,000 sessions │ $3,041.00 │ $69.00 │ $2,972.00 │
└──────────────────┴────────────────┴────────────────┴────────────────┘
A team of 10 developers doing 5 sessions/day saves ~$1,486/day.
All tools run in under 35ms.
Most run in under 5ms.
No background processes. No startup delay.
compress_logs ██ 1ms
smart_read ████ 4ms
function_extractor ██ 1ms
project_map ██████████████████████████████████ 33ms ← walks disk
bulk_search ████ 5ms
compress_logs took an 85-line log with 42 repeated "Connection refused" errors and returned:
## Log Analysis: sample.log
Original: 85 lines | Returned: 7 entries
FATAL: Database connection pool exhausted after 46 retries
ERROR: Connection refused to postgres:5432 (×42)
ERROR: JWT verification failed: token expired
ERROR: Unhandled exception: Cannot read properties of null
WARN: High memory usage: 87%
WARN: Response time degradation: avg 450ms
WARN: Disk usage critical: 94%
function_extractor on AuthService.ts (120 lines, ~1,245 tokens) with name: "login" returned only the login() function — 249 tokens instead of 1,245. The other 11 methods, imports, and irrelevant code were not returned.
project_map on the entire repository returned a structured map of all 39 files across all directories in 815 tokens — instead of reading every file which would cost ~95,000 tokens.
BEFORE AFTER
────────────────────────────────── ──────────────────────────────────
Claude Code Claude Code
│ │
│ "read AuthService.ts" │ "read AuthService.ts"
│ │
▼ ▼
Filesystem ┌─────────────────────────┐
│ │ claude-context-optimizer│
│ 800 lines │ │
│ ~8,000 tokens ──────────► │ 1. Hash check (0ms) │
│ │ 2. Session lookup │
│ Read again? 8,000 more │ 3. AST chunk + score │
│ Read again? 8,000 more │ 4. Return 80 lines │
│ │ ~800 tokens │
▼ └────────────┬────────────┘
Context window fills up │
Claude forgets earlier work Context stays clean
You pay 3× for the same file You pay once, smartly
You open Claude Code. You say "hello". You've already spent tokens.
Every time Claude reads a file, it reads the entire file — regardless of how much of it is relevant to your question. Every time it sees a log file, it reads every line from the first to the last. Every time it re-opens a file it read 2 turns ago, it spends the same tokens again as if it had never seen it.
This is not a bug. It's how language models work. But it doesn't have to be your problem.
Let's take a real project:
A mid-size TypeScript project:
50 source files × 300 lines average = 15,000 lines
5 log files × 2,000 lines each = 10,000 lines
package-lock.json = 5,000 lines
If Claude reads everything once:
~30,000 lines × 40 chars/line ÷ 4 = ~300,000 tokens
A 5-turn conversation where Claude re-reads files:
300,000 × 3 (average re-reads) = ~900,000 tokens
At $15/million tokens (Claude Opus): = $13.50 per conversation
With this optimizer:
Same conversation:
smart_read returns 200 relevant lines per file read
recall_file confirms unchanged files without any read
compress_logs returns 50 lines from a 2,000-line log
Total tokens used: = ~144,000 tokens
Savings: = 84%
Cost: = $2.16 per conversation
This is not theoretical. This is the math.
Every "token optimization" tool we found does one of two things:
- Truncates blindly — cuts content after N characters. Loses critical information at the end of files.
- Summarizes with AI — uses another AI call to summarize. Costs tokens to save tokens. Paradoxical.
We needed something different.
After analyzing real Claude Code sessions, we identified three distinct sources of waste:
Root Cause 1: Reading entire files when only fragments are needed
When you ask "how does the authentication work?", Claude reads all of AuthService.ts, UserModel.ts, JWTUtil.ts, and middleware/auth.ts — every line of every file. In reality, 70–90% of those lines are irrelevant to the question.
Root Cause 2: Re-reading unchanged files
In a 20-turn conversation, Claude may read config.ts 8 times. If the file never changed, those 7 extra reads waste 100% of the tokens spent on the first read. There is no memory of "I already read this."
Root Cause 3: Zero-density data sources
Log files, lock files, generated files — these are read with the same attention as hand-written source code. A 5,000-line log file might contain 3 relevant errors. The other 4,997 lines are pure token waste.
We built three distinct engines to address each root cause separately:
╔══════════════════════════════════════════════════════════════════╗
║ claude-context-optimizer v1.0.0 ║
╠══════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────────────────────────────────────────────────┐ ║
║ │ ENGINE 1 — FileCache fixes: Root Cause 2 │ ║
║ │ │ ║
║ │ file.ts ──► stat (mtime+size) ──► hash match? │ ║
║ │ │ │ ║
║ │ yes ──┤── no │ ║
║ │ │ │ │ ║
║ │ "unchanged" full read │ ║
║ │ 0 tokens + cache │ ║
║ │ Storage: SQLite WAL (non-blocking, ACID, indexed) │ ║
║ └─────────────────────────────────────────────────────────┘ ║
║ ║
║ ┌─────────────────────────────────────────────────────────┐ ║
║ │ ENGINE 2 — SemanticIndex fixes: Root Cause 1 │ ║
║ │ │ ║
║ │ .ts/.tsx/.js ──► AST chunker ──► functions/classes │ ║
║ │ .py ──► indent parser ──► defs/classes │ ║
║ │ .go/.rs/.java ──► regex parser ──► signatures │ ║
║ │ .md/.yaml/txt ──► sliding window ──► paragraphs │ ║
║ │ │ │ ║
║ │ RelevanceScorer │ ║
║ │ (keyword freq + identifier bonus) │ ║
║ │ │ │ ║
║ │ top N chunks ≤ token budget │ ║
║ └─────────────────────────────────────────────────────────┘ ║
║ ║
║ ┌─────────────────────────────────────────────────────────┐ ║
║ │ ENGINE 3 — SessionMemory fixes: Root Cause 2+3 │ ║
║ │ │ ║
║ │ Turn 1: read auth.ts → session: { auth.ts: hash1 } │ ║
║ │ Turn 2: read utils.ts → session: { auth.ts, utils } │ ║
║ │ Turn 5: "auth.ts again?" → hash unchanged → 0 tokens │ ║
║ │ │ ║
║ │ Also powers: context_budget, session_snapshot │ ║
║ └─────────────────────────────────────────────────────────┘ ║
║ ║
╚══════════════════════════════════════════════════════════════════╝
You ask Claude: "how does login work?"
│
▼
Claude calls smart_read({ file: "AuthService.ts", query: "login" })
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SmartReadTool │
│ │
│ Step 1: SessionMemory.wasReadInSession("AuthService.ts") │
│ └─► found! hash = abc123 │
│ │
│ Step 2: HashUtil.fromFileStat("AuthService.ts") │
│ └─► current hash = abc123 ← matches! │
│ │
│ Step 3: SemanticIndex.query("AuthService.ts", "login") │
│ └─► ASTChunker finds: login(), validateToken() │
│ └─► RelevanceScorer ranks: login (score 14) │
│ validateToken (score 3) │
│ │
│ Step 4: fitInBudget(chunks, 2000 tokens) │
│ └─► returns login() function only = 180 tokens │
└──────────────────────────┬──────────────────────────────────┘
│
▼
Claude receives: 180 tokens
Instead of: 8,000 tokens
Saved: 97.75%
The project follows a strict separation of concerns. No folder contains two different concepts.
┌──────────────────────────────────────────────────────────────┐
│ Claude Code (MCP client) │
└──────────────────────────┬───────────────────────────────────┘
│ stdio / MCP protocol
┌──────────────────────────▼───────────────────────────────────┐
│ src/server/index.ts │
│ (route → tool, format output, error boundary) │
└──┬───┬───┬───┬───┬───┬───┬───┬───┬──────────────────────────┘
│ │ │ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ TOOLS LAYER
compress smart file proj ctx bulk recall dep fn snap
_logs _read _diff _map budg srch _file graph ext shot
│ │ │ │
│ ├────────────────────────►│ │ ENGINES LAYER
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌────────────────────────┐
│ SemanticIndex │ │ SessionMemory │
│ ┌────────────┐ │ │ ┌──────────────────┐ │
│ │ ASTChunker │ │ │ │ SQLite (WAL) │ │
│ │ SlideWin │ │ │ │ session_files │ │
│ └────────────┘ │ │ └──────────────────┘ │
│ ┌────────────┐ │ └────────────────────────┘
│ │TS/Py/Gen │ │
│ │ Parsers │ │ ┌────────────────────────┐
│ └────────────┘ │ │ CacheManager │
└──────────────────┘ │ ┌──────────────────┐ │
│ │ FileCache │ │
UTILS LAYER │ │ SQLite (WAL) │ │
┌────────────────────────┐ │ └──────────────────┘ │
│ HashUtil │ Token │ └────────────────────────┘
│ Platform │ Estimator│
└────────────────────────┘
claude-context-optimizer/
│
├── src/
│ ├── server/ ← MCP server entry point (one file)
│ │ └── index.ts
│ │
│ ├── config/ ← All constants in one place
│ │ └── constants.ts
│ │
│ ├── models/ ← Pure TypeScript interfaces (no logic)
│ │ ├── FileRecord.ts ← File cache record shapes
│ │ ├── SessionRecord.ts ← Session and snapshot shapes
│ │ └── ToolResult.ts ← All tool return types
│ │
│ ├── utils/
│ │ ├── hash/
│ │ │ └── HashUtil.ts ← SHA-256 file hashing + fast stat path
│ │ ├── platform/
│ │ │ └── PlatformUtil.ts ← Mac/Windows/Linux path resolution
│ │ └── token/
│ │ └── TokenEstimator.ts ← Fast token counting without tiktoken
│ │
│ ├── engines/
│ │ ├── cache/
│ │ │ ├── FileCache.ts ← SQLite read/write layer (raw DB ops)
│ │ │ └── CacheManager.ts ← High-level: read + cache + invalidate
│ │ ├── session/
│ │ │ ├── SessionMemory.ts ← Track files read per session
│ │ │ └── SnapshotManager.ts ← Save/restore session snapshots
│ │ └── semantic/
│ │ ├── parsers/
│ │ │ ├── LogParser.ts ← Any log format → structured entries
│ │ │ ├── TypeScriptParser.ts ← TS/JS AST-style extraction
│ │ │ ├── PythonParser.ts ← Python indent-aware extraction
│ │ │ └── GenericParser.ts ← Go/Rust/Java/C#/Ruby fallback
│ │ ├── chunkers/
│ │ │ ├── ASTChunker.ts ← Chunk code by semantic units
│ │ │ └── SlidingWindowChunker.ts ← Chunk text by sliding window
│ │ └── SemanticIndex.ts ← Query interface: file + query → chunks
│ │
│ └── tools/
│ ├── compress-logs/
│ │ ├── PatternMatcher.ts ← Log pattern detection + normalization
│ │ └── CompressLogsTool.ts ← Tool implementation
│ ├── smart-read/
│ │ ├── RelevanceScorer.ts ← Score chunks against query
│ │ └── SmartReadTool.ts ← Tool implementation
│ ├── file-diff/
│ │ └── FileDiffTool.ts
│ ├── project-map/
│ │ ├── FileTreeBuilder.ts ← Walk + describe project structure
│ │ └── ProjectMapTool.ts ← Tool implementation
│ ├── context-budget/
│ │ ├── TokenCounter.ts ← Analyze + recommend
│ │ └── ContextBudgetTool.ts ← Tool implementation
│ ├── bulk-search/
│ │ └── BulkSearchTool.ts
│ ├── recall-file/
│ │ └── RecallFileTool.ts
│ ├── dependency-graph/
│ │ └── DependencyGraphTool.ts
│ ├── function-extractor/
│ │ └── FunctionExtractorTool.ts
│ └── session-snapshot/
│ └── SessionSnapshotTool.ts
│
├── install.sh ← One-command installer (Mac/Linux/Windows WSL)
├── package.json
├── tsconfig.json
└── README.md
Design rules we followed:
- Every folder has exactly one responsibility
- No file contains logic that belongs to a different layer
- Models are pure interfaces — zero business logic
- Engines are reusable — tools compose engines, not vice versa
- Tools are thin wrappers — they format output and call engines
curl -fsSL https://raw.githubusercontent.com/AzozzALFiras/claude-context-optimizer/main/install.sh | bashThis will:
- Check Node.js version (requires v18+)
- Check Claude Code is installed
- Register the MCP server globally
- Show you all available tools
claude mcp add context-optimizer npx claude-context-optimizer --scope globalclaude mcp list
# Should show: context-optimizerRe-run the one-command installer. It removes the old registration first.
claude mcp remove context-optimizer --scope global| Platform | Tested | Cache location |
|---|---|---|
| macOS | ✅ | ~/.claude/context-optimizer/ |
| Linux | ✅ | ~/.claude/context-optimizer/ |
| Windows (WSL) | ✅ | ~/.claude/context-optimizer/ |
| Windows (native) | ✅ | %APPDATA%\context-optimizer\ |
Requirements:
- Node.js v18 or higher
- Claude Code CLI
- Git (optional, required for
file_diff_only)
The problem it solves: A 5,000-line log file contains maybe 10 actionable errors. Reading it fully wastes 95% of tokens on timestamps, debug messages, and repeated noise.
How it works:
- Reads the log file line by line
- Detects FATAL / ERROR / WARN / Exception patterns (all log formats: plain, JSON, logfmt)
- Extracts N lines of context around each match (stack traces, request IDs)
- Deduplicates: "Connection refused" appearing 200 times becomes one entry with
(×200) - Returns structured output with severity grouping
Example:
Input: 5,000 lines, ~50,000 tokens
Output: 40 lines, ~400 tokens
Saved: 98%
// Claude calls:
compress_logs({ file_path: "/var/log/app.log", context_lines: 3 })
// Returns:
## Log Analysis: /var/log/app.log
Original: 5,000 lines | Returned: 23 entries
### Errors
Line 1247 (×47): Connection refused to postgres:5432
> Retrying in 5s...
> Attempt 47 of 50
Line 3891: JWT verification failed: token expired
> User: user_abc123
> Endpoint: POST /api/ordersThe problem it solves: You need to understand how authentication works. Claude reads all 800 lines of AuthService.ts when only the login() and validateToken() functions (80 lines) are relevant.
How it works:
- Checks session memory — was this file read before this session?
- Checks file hash — has it changed since last read?
- If unchanged and in session: zero disk reads, returns summary
- If new/changed: reads file, runs AST chunker (TS/JS/Python) or sliding window (other files)
- Scores every chunk against your query using keyword frequency + identifier matching
- Returns only chunks that score above zero, ordered by relevance, capped at token budget
Language support:
- TypeScript/JavaScript: extracts functions, classes, interfaces by AST
- Python: extracts defs and classes respecting indent structure
- Go, Rust, Java, C#, Ruby, PHP: regex-based signature extraction
- YAML, JSON, Markdown, any text: sliding window with relevance scoring
Example:
smart_read({ file_path: "/app/src/auth/AuthService.ts", query: "JWT token validation" })
// Returns only:
## /app/src/auth/AuthService.ts (from cache — unchanged)
600 lines | typescript
### Lines 145–187 — `validateToken`
```typescript
async validateToken(token: string): Promise<User | null> {
// ... only this function
}
---
### 3. `file_diff_only`
**The problem it solves:** You changed 5 lines in a 400-line file. Claude reads all 400 lines to understand the change.
**How it works:**
Runs `git diff` and returns only the changed lines with configurable context. Works against HEAD, any commit, any branch, or staged changes.
**Example:**
```typescript
file_diff_only({ file_path: "/app/src/server.ts", base: "main" })
// Returns:
## Diff: server.ts vs main
```diff
@@ -45,6 +45,8 @@
app.use(cors())
+app.use(helmet())
+app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }))
app.use(express.json())
Tokens: ~150 instead of ~4,000 for the full file.
---
### 4. `project_map`
**The problem it solves:** You open a new codebase. Claude reads 20 files to understand the structure. You could have understood the entire project in 300 tokens.
**How it works:**
Walks the directory tree (ignoring `node_modules`, `dist`, `.git`, etc.), collects every source file, identifies languages, estimates token costs, groups by directory, and returns a single compressed map.
**Example output:**
47 files | 12,450 lines | ~31k tokens total
- typescript: 32 files
- markdown: 8 files
- yaml: 4 files
- json: 3 files
/src/auth/
- AuthService.ts — service (~1.2k tokens)
- JWTUtil.ts — utilities (~400 tokens)
- middleware.ts — service (~300 tokens)
/src/api/
- router.ts — routes (~500 tokens)
- handlers.ts — controller (~800 tokens)
---
### 5. `context_budget`
**The problem it solves:** You don't know how close you are to the context limit until Claude stops working or starts forgetting things. By then it's too late.
**How it works:**
Analyzes items in your context (or auto-pulls from session history), estimates tokens for each, categorizes them by whether they should be kept or removed, and gives specific recommendations with projected savings.
**Budget categories:**
- `keep` — core files actively being worked on
- `consider-removing` — large files read early in the session, now stale
- `remove` — log files, lock files, generated code
---
### 6. `bulk_search`
**The problem it solves:** You need to find where `validateUser` is called across the codebase. Claude reads 30 files to find 8 matches.
**How it works:**
Recursively searches all files (respecting ignore patterns), runs regex against each line, returns only matching lines with 2 lines of context per match. Never returns full file content.
**Example:**
```typescript
bulk_search({ pattern: "validateUser", file_extensions: [".ts"] })
// Returns:
## Search: `validateUser` in /app
8 matches in 5 files
### src/api/handlers.ts
L45: `const user = await validateUser(req.headers.authorization)`
> if (!user) return res.status(401).json({ error: 'Unauthorized' })
### src/auth/AuthService.ts
L112: `async validateUser(token: string): Promise<User>`
The problem it solves: You ask Claude to "look at AuthService.ts again". It reads the whole file. The file hasn't changed in 30 minutes.
How it works: Checks session memory for the file path. If found, computes the current stat hash (fast — no file read) and compares with the cached hash. If unchanged, returns the cached summary and confirms no re-read is needed.
Zero tokens for unchanged files. This is the highest-leverage tool in the set.
The problem it solves: Before modifying a shared utility, you need to know what depends on it. Understanding this normally requires reading many files.
How it works:
Parses import statements from all code files, builds a directed graph of imports → imported by relationships, returns either a file-level view or a project-level view showing the most imported modules.
The problem it solves: You need to see one specific function from a 600-line file. You only need 30 lines.
How it works: Uses the AST chunker to locate a function or class by exact name. Falls back to relevance scoring if the exact name isn't found. Returns only the matched function with its file path and line number.
Example:
function_extractor({ file_path: "/app/src/auth/AuthService.ts", name: "login" })
// Returns:
## `login` — /app/src/auth/AuthService.ts:67
```typescript
async login(email: string, password: string): Promise<AuthResult> {
const user = await this.userRepo.findByEmail(email);
if (!user) throw new AuthError('User not found');
const valid = await bcrypt.compare(password, user.passwordHash);
if (!valid) throw new AuthError('Invalid credentials');
return { token: this.jwt.sign({ userId: user.id }), user };
}Tokens: ~200 instead of ~6,000 for the full file.
---
### 10. `session_snapshot`
**The problem it solves:** Long tasks get interrupted. You come back to Claude, it's lost context of what was being worked on, and re-reading everything costs tokens.
**How it works:**
Saves a snapshot of the current session — which files were read, their hashes, and a summary of the current state. On restore, returns this snapshot so Claude can resume without re-reading files that haven't changed.
---
## The Technology Choices
### Why SQLite (not a JSON file)?
SQLite with WAL mode gives us:
- **Non-blocking reads** — multiple reads never wait for each other
- **ACID transactions** — cache is never corrupted, even on crash
- **Indexed lookups** — `O(log n)` file lookup vs `O(n)` JSON scan
- **Cross-process safety** — multiple Claude windows share one cache
A JSON file would require full parse + full write on every cache access. For 500 cached files, that's 500 file scans per session.
### Why `better-sqlite3` (not `sql.js`)?
`better-sqlite3` is **synchronous**. This matters because:
- MCP tools are called in Node.js event loop
- Async SQLite creates unnecessary complexity
- Synchronous DB is faster for single-process, single-user use cases
- No deadlock risk, no callback hell, no promise chains
### Why no `tiktoken`?
`tiktoken` is accurate but:
- Requires native compilation (breaks on some systems)
- Adds 10+ MB to the package
- Takes 200ms to load on first use
Our `chars / 4` estimator is:
- Within 10% accuracy for English/code content (sufficient for budgeting)
- Instant — zero overhead
- Zero dependencies
- Works identically on all platforms
### Why regex-based AST parsing instead of a real AST parser?
A real TypeScript AST parser (`@typescript-eslint/parser`, `ts-morph`) would be more accurate. But:
- Adds 50–200 MB of dependencies
- Takes 500ms–2s to parse large files
- Breaks on files with syntax errors
- Requires separate parsers per language
Our regex/indent-based approach:
- 0ms parse time (single-pass line scan)
- Works on 12 languages with one pattern table
- Handles syntax errors gracefully (returns what it found)
- Adds zero dependencies
For the use case (extracting function boundaries for token optimization), this accuracy is sufficient.
---
## How Much Does It Save?
| Scenario | Without | With | Saving |
|----------|---------|------|--------|
| Reading a 500-line file for one function | ~5,000 tokens | ~200 tokens | **96%** |
| Reading a 5,000-line log | ~50,000 tokens | ~500 tokens | **99%** |
| Re-reading an unchanged file | ~5,000 tokens | 0 tokens | **100%** |
| Understanding a new project (20 files) | ~80,000 tokens | ~500 tokens | **99%** |
| Finding a pattern across 30 files | ~300,000 tokens | ~2,000 tokens | **99%** |
| **Typical 20-turn work session** | **~500,000 tokens** | **~80,000 tokens** | **84%** |
### Visual: token consumption per turn
Tokens/turn (typical session — 20 turns)
Without optimizer: Turn 1 ████████████████████████████████ 32,000 Turn 2 ████████████████████████████████ 31,000 Turn 3 ████████████████████████████████ 33,000 ← re-reads same files Turn 5 ████████████████████████████████ 35,000 Turn 10 ███████████████████████████████████████ 42,000 Turn 15 ████████████████████████████████████████████ 48,000 ← context filling Turn 20 ██████████ 9,000 ← Claude starts forgetting, quality drops
With optimizer: Turn 1 ████████ 8,000 ← first read + cache Turn 2 ███ 3,000 ← recall_file: unchanged, 0 tokens Turn 3 ████ 4,000 Turn 5 ███ 2,500 ← smart_read: only relevant chunk Turn 10 ███ 3,000 Turn 15 ███ 3,500 Turn 20 ████ 4,000 ← context stays clean, quality stays high
Total: Without = ~520,000 With = ~82,000 Saved = 84%
### The cache hit rate over time
Cache hits (%) as session progresses
100% ┤ ············ 90% ┤ ····· 80% ┤ ····· 70% ┤ ····· 60% ┤ ····· 50% ┤ ····· 40% ┤ ····· 30% ┤ ····· 20% ┤· 0% ┼──────────────────────────────────────────────── Turn 1 Turn 5 Turn 10 Turn 15 Turn 20
Every turn, more files are cached. By Turn 10, ~80% of file requests cost 0 tokens.
---
## Decision Tree: Which Tool to Use
You need to work with a file or codebase... │ ▼ ┌─────────────────────────────────────┐ │ Have I read this file this session? │ └───────────────────┬─────────────────┘ │ │ yes no │ │ ▼ ▼ ┌──────────────┐ ┌────────────────────────────────────┐ │ recall_file │ │ What do I need from the file? │ │ │ └────────────┬───────────────────────┘ │ unchanged? │ │ │ → 0 tokens │ ┌──────┴──────────┐ │ changed? │ │ │ │ → smart_read│ specific understand └──────────────┘ function/class how it works │ │ ▼ ▼ function_extractor smart_read (name: "login") (query: "...")
You need to understand the whole project... │ ▼ ┌──────────────────────────────────────┐ │ project_map │ │ Get the full structure in ~300 tok │ └──────────────────────────────────────┘ │ ▼ (then drill down with) dependency_graph → function_extractor → smart_read
You need to find something across the codebase... │ ▼ ┌──────────────────────────────────────┐ │ bulk_search │ │ pattern: "validateUser" │ │ returns snippets, never full files │ └──────────────────────────────────────┘
You have a huge log file... │ ▼ ┌──────────────────────────────────────┐ │ compress_logs │ │ 5,000 lines → 40 relevant entries │ │ deduplicates repeated errors │ └──────────────────────────────────────┘
You want to see what changed in a file... │ ▼ ┌──────────────────────────────────────┐ │ file_diff_only │ │ git diff vs HEAD or any branch │ │ returns only changed lines │ └──────────────────────────────────────┘
## Resource Usage
This server is designed to consume **almost no CPU or memory**:
| Resource | Usage | Why |
|----------|-------|-----|
| Memory | ~15 MB | SQLite + Node.js baseline |
| CPU (idle) | 0% | No polling, no watchers |
| CPU (per call) | <5ms | Hash check = stat syscall |
| Disk (cache) | ~1 KB per file | Summary + hash only |
| Startup time | ~50ms | SQLite WAL is instant |
**What we deliberately avoided:**
- `fs.watch` / `chokidar` — continuous file watching is expensive and unnecessary
- In-memory file content cache — wastes RAM, SQLite is faster for on-demand access
- Background workers — no threads, no IPC overhead
- Interval timers — nothing runs between tool calls
### CPU timeline: what happens between tool calls
Time ──────────────────────────────────────────────────────────►
Tool call arrives Tool returns Next call arrives │ │ │ ▼ ▼ ▼ ─────┬────────────────────────┬─────────────────────┬────────── │████████████████████████│ │ │ <5ms work │ 0% CPU │ <5ms │ hash + SQLite + score │ process sleeps │ work ─────┴────────────────────────┴─────────────────────┴──────────
The server does nothing between calls. No polling. No watchers. No timers. Pure on-demand.
---
## The Technology Stack — Why Each Choice Was Made
┌────────────────────────────────────────────────────────────────┐ │ Choice Alternative Why we chose this │ ├────────────────────────────────────────────────────────────────┤ │ better-sqlite3 sql.js Synchronous, native, │ │ 10× faster, WAL mode │ ├────────────────────────────────────────────────────────────────┤ │ chars/4 estimator tiktoken 0ms, 0 deps, 10% │ │ accuracy is enough │ ├────────────────────────────────────────────────────────────────┤ │ regex AST parser ts-morph 0ms parse, 12 langs, │ │ @typescript-eslint survives syntax errors │ ├────────────────────────────────────────────────────────────────┤ │ stat hash fast-path full file hash 1 syscall vs file read │ │ (mtime + size) 99% of the time correct │ ├────────────────────────────────────────────────────────────────┤ │ SQLite WAL mode default journal Non-blocking reads, │ │ concurrent windows safe │ ├────────────────────────────────────────────────────────────────┤ │ stdio transport HTTP transport No port conflicts, │ │ no auth needed, simpler │ ├────────────────────────────────────────────────────────────────┤ │ npx distribution global install Zero setup, always │ │ latest, works offline │ └────────────────────────────────────────────────────────────────┘
## Contributing
Pull requests are welcome. Before opening one:
1. Run `npm run build` — must compile without errors
2. Follow the folder structure — one concept per folder
3. Add the attribution comment at the top of every new file:
```typescript
// Developer By Azozz ALFiras
// https://github.com/AzozzALFiras/claude-context-optimizer
MIT — use it, fork it, build on it.
Azozz ALFiras
- GitHub: @AzozzALFiras
- Project: claude-context-optimizer
A user sends Claude a 20-task project. What happens?
Turn 1–10: Claude reads files, understands requirements, starts working
Turn 11–20: Works through tasks, remembers everything
Turn 21–30: Context fills up — Claude starts "forgetting" earlier instructions
Turn 31+: 180K tokens reached → catastrophic failure
Claude contradicts itself, loses track of completed work,
re-reads files it already read, asks questions it already answered.
The user has to start over. All progress is lost.
This is not a Claude limitation you have to accept. It's a solved problem.
Two tools that work together to make long sessions resumable:
┌─────────────────────────────────────────────────────────────────┐
│ task_manager │
│ │
│ Breaks any task into subtasks → persists state to disk │
│ On checkpoint: generates a ~300 token "resume prompt" │
│ On resume: restores full context in one call │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ context_watchdog │
│ │
│ Monitors token usage throughout the session │
│ 70% → warning: consider checkpointing │
│ 85% → critical: checkpoint now │
│ 95% → emergency: auto-saves checkpoint, shows resume prompt │
└─────────────────────────────────────────────────────────────────┘
Step 1: User sends a large task → Claude creates a plan
task_manager({ action: "create", title: "Build auth system", tasks: [
"Create User model",
"Build AuthService with login/logout",
"Implement JWT validation",
"Add rate limiting",
"Write tests",
"Update docs"
]})
Step 2: Work proceeds normally. At 85% context fill:
## Context Watchdog 🔴
⚠️ CRITICAL: 85% full — checkpoint strongly recommended
[█████████████████░░░] 85%
Remaining capacity: ~27,000 tokens
→ Run: task_manager({ action: "checkpoint" })
Step 3: Checkpoint saved
## ✅ Checkpoint Saved
Resume prompt size: ~227 tokens (vs full context: ~180,000)
TASK RESUME — Build auth system
Progress: 3/6 subtasks complete (50%)
✅ Done:
- Create User model (User.ts with TypeORM decorators)
- Build AuthService (bcrypt + JWT)
- Implement JWT validation (validateToken + refreshTokens)
📋 Pending:
- Add rate limiting
- Write tests
- Update docs
📁 Files changed: src/models/User.ts, src/auth/AuthService.ts
💡 Key decisions: bcrypt rounds=12, JWT 1h expiry, sessions table for revocation
Step 4: New Claude Code session — one command to resume
task_manager({ action: "resume" })
→ Returns full context in 227 tokens
→ Claude continues exactly where it left off
Without task_manager:
Context collapse at turn 30 → start over → ~180,000 tokens wasted
With task_manager:
Checkpoint at turn 25 → resume prompt: 227 tokens
New session reads only: → 227 + relevant files (~2,000) = ~2,227 tokens
Tokens to resume: 227 vs 180,000 = 99.9% reduction in resume cost
Highest impact Lowest impact
│ │
▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ recall │ │ project │ │ compress │ │ smart │ │ context │
│ _file │ │ _map │ │ _logs │ │ _read │ │ _budget │
│ │ │ │ │ │ │ │ │ │
│ 100% │ │ 99% │ │ 95-99% │ │ 70-90% │ │ advisory │
│ savings │ │ savings │ │ savings │ │ savings │ │ only │
│ for │ │ vs │ │ vs full │ │ per │ │ │
│ unchanged│ │ reading │ │ log │ │ query │ │ │
│ files │ │ all files│ │ file │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Use recall_file FIRST — if file is unchanged, you're done.
Use project_map ONCE per session to orient yourself.
Use compress_logs instead of reading logs directly.
Use smart_read for everything else.
Use context_budget when something feels slow or forgetful.
How does
claude-context-optimizercompare to other Claude memory/context tools?
┌─────────────────────────────────────────────────────────────────────────────────┐
│ claude-context-optimizer vs claude-mem │
├─────────────────────────────────────────┬───────────────────────────────────────┤
│ claude-context-optimizer (this project)│ claude-mem (thedotmack) │
├─────────────────────────────────────────┼───────────────────────────────────────┤
│ PROBLEM: Token waste in current session│ PROBLEM: Forgetting past sessions │
│ WHEN: Right now, as you work │ WHEN: Next week, new conversation │
│ HOW: On-demand, zero background work │ HOW: Background HTTP server + DB │
│ DEPS: Node.js only │ DEPS: Bun + Python + uv + ChromaDB │
│ LICENSE: MIT │ LICENSE: AGPL-3.0 │
│ INSTALL: npx one-liner │ INSTALL: Plugin marketplace │
├─────────────────────────────────────────┴───────────────────────────────────────┤
│ │
│ They solve DIFFERENT problems. They are COMPLEMENTARY, not competing. │
│ │
│ claude-mem = long-term episodic memory ("what did we do last sprint?") │
│ this tool = real-time token efficiency ("don't re-read unchanged files") │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Three ideas from claude-mem were adapted for our architecture:
1. <private> tag stripping — smart_read now automatically redacts <private>...</private> blocks before content reaches Claude's context. Drop API keys, secrets, or PII inside these tags in any source file.
// Any file can contain:
const config = {
apiKey: <private>sk-proj-real-key-here</private>, // redacted from context
endpoint: 'https://api.example.com',
};2. Typed observations — task_manager now supports semantic observation types (bugfix | feature | decision | discovery | warning), making resume prompts more structured and scannable:
task_manager({
action: "checkpoint",
observations: [
{ type: "bugfix", content: "fixed JWT expiry race condition in auth middleware" },
{ type: "decision", content: "using bcrypt rounds=12 for password hashing" },
{ type: "discovery", content: "rate limiter was silently swallowing 429 errors" }
]
})Resume prompt now groups by type with icons (🐛 Bugfixes, ✨ Features, 💡 Decisions, 🔍 Discoveries,
3. Progressive disclosure in bulk_search — Start cheap, drill down only if needed:
Layer 1 — detail_level: "files" → ~50 tokens (just file paths + match count)
Layer 2 — detail_level: "lines" → ~200 tokens (matching lines, no context)
Layer 3 — detail_level: "context" → full output (lines + surrounding code)
// Step 1: find which files are relevant
bulk_search({ pattern: "useEffect", detail_level: "files" })
// Step 2: only if you need the lines
bulk_search({ pattern: "useEffect", file_extensions: [".tsx"], detail_level: "lines" })Built because Claude is powerful, but token waste is real. This project exists to make Claude Code sustainable at scale.