fix: harden KnowledgeGraph and MCP server for multi-process access#948
Open
felipecpaiva wants to merge 1 commit intoMemPalace:developfrom
Open
fix: harden KnowledgeGraph and MCP server for multi-process access#948felipecpaiva wants to merge 1 commit intoMemPalace:developfrom
felipecpaiva wants to merge 1 commit intoMemPalace:developfrom
Conversation
When multiple mcp-proxy SSE connections share the same mempalace data, concurrent processes compete for SQLite and ChromaDB resources. KnowledgeGraph changes: - Increase busy_timeout from 10s to 60s - Add _sqlite_retry decorator with exponential backoff for lock/busy errors - Use BEGIN IMMEDIATE for writes (detect contention at transaction start) - Add WAL autocheckpoint and journal_size_limit pragmas - Register atexit handler for clean WAL shutdown MCP server changes: - Rate-limit chroma.sqlite3 mtime checks to 5s intervals to prevent PersistentClient recreation (and HNSW index reload) on every write - Add _refresh_db_mtime() after ChromaDB writes to prevent self-triggered reconnects - Bypass mtime cooldown for safety-critical DB disappearance detection - Fix tool_reconnect to fully clear client and mtime state Tests: - Retry decorator: lock retry, exhaustion, non-lock errors, busy variant - Connection pragmas: wal_autocheckpoint, journal_size_limit, WAL mode - Multi-process concurrent writes: 4 processes x 20 triples, zero failures
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When mempalace runs behind
mcp-proxy(SSE mode) with multiple concurrent clients, two classes of concurrency bugs surface:SQLite "database is locked" in KnowledgeGraph —
busy_timeoutwas only 10s with no application-level retry. Concurrent writers from separate mcp-proxy processes exhaust the timeout.ChromaDB client cache thrashing — every write changes
chroma.sqlite3mtime. Other processes detect this and recreatePersistentClient, reloading the full HNSW index from disk on every operation.KnowledgeGraph fixes
busy_timeout10s → 60s_sqlite_retrydecorator: exponential backoff with jitter (5 retries, only for "locked"/"busy" errors)BEGIN IMMEDIATEfor write transactions (detect contention at start, not mid-transaction)PRAGMA wal_autocheckpoint=1000+journal_size_limit=64MB(manage WAL growth)atexit.register(self.close)for clean WAL checkpointing on shutdownMCP server fixes
chroma.sqlite3stat/mtime checks to 5-second intervals_refresh_db_mtime()after writes prevents self-inflicted client recreationtool_reconnectnow fully clears client + mtime stateTests
TestSQLiteRetryDecorator— 5 cases: retry success, exhaustion, non-lock errors, busy variantTestConnectionPragmas— 3 cases: autocheckpoint, journal_size_limit, WAL modeTestMultiProcessLocking— 4 processes × 20 triples to same DB file, zero failuresTest plan
python -m pytest tests/ -v --ignore=tests/benchmarks— 958 passedruff check+ruff format --check— clean