fix: remove phantom state.db path and fix explain.py skip detection#455
Open
tbroadley wants to merge 4 commits intosjawhar:mainfrom
Open
fix: remove phantom state.db path and fix explain.py skip detection#455tbroadley wants to merge 4 commits intosjawhar:mainfrom
tbroadley wants to merge 4 commits intosjawhar:mainfrom
Conversation
StateDB has always used LMDB (state.lmdb/ directory), but callers passed a "state.db" path that was never created. This caused a bug in explain.py where `if state_db_path.exists()` always returned False, skipping the O(1) generation-based skip detection and falling through to expensive per-file rehashing — making `pivot status`, `pivot repro -n`, and `pivot repro -n -e` extremely slow. Changes: - StateDB.__init__ now takes state_dir directly instead of deriving it from a phantom db_path.parent - Remove get_state_db_path() — callers use get_state_dir() directly - Remove existence guard in explain.py so generation skip always runs - Remove state.db from .gitignore template - Update docs to reference state.lmdb/ instead of state.db Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ac24dcc to
c4d43ff
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a major performance regression where pivot status / pivot repro -n / pivot repro -n -e always missed the O(1) generation-based skip path due to checking for a non-existent state.db file, and cleans up the codebase to consistently treat state as an LMDB directory (state.lmdb/) under a state directory.
Changes:
- Refactor
StateDBto accept astate_dirdirectly and removeget_state_db_path()in favor ofget_state_dir(). - Update
explain.pyto run generation-based skip detection viaStateDB(instead of being gated on the phantomstate.dbpath). - Update tests, CLI init gitignore template, and docs to reference
state.lmdb/and the state directory API.
Reviewed changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/pivot/tests/test_skip_detection_integration.py | Update StateDB construction to pass state_dir directly. |
| packages/pivot/tests/test_run_history.py | Update StateDB construction to use the state directory. |
| packages/pivot/tests/test_run_cache_lock_update.py | Update test state path to .pivot/ directory instead of .pivot/state.db. |
| packages/pivot/tests/test_directory_out_integration.py | Clear run cache by removing state.lmdb/ directory rather than phantom files. |
| packages/pivot/tests/storage/test_state_db_timeout.py | Update timeout tests to pass tmp_path as the state directory. |
| packages/pivot/tests/storage/test_state.py | Update all state storage tests to use a state directory and state.lmdb/ pathing. |
| packages/pivot/tests/storage/test_cache.py | Update cache tests to open StateDB with a directory. |
| packages/pivot/tests/remote/test_transfer.py | Update remote transfer integration tests to construct StateDB(state_dir). |
| packages/pivot/tests/pipeline/test_pipeline.py | Update wording in docstring to remove state.db reference. |
| packages/pivot/tests/integration/test_unified_execution.py | Remove get_state_db_path monkeypatch usage. |
| packages/pivot/tests/integration/test_lazy_resolution.py | Update wording in docstring to remove state.db reference. |
| packages/pivot/tests/fingerprint/test_no_fingerprint.py | Update StateDB construction to pass state_dir. |
| packages/pivot/tests/fingerprint/test_fingerprint.py | Switch monkeypatching from get_state_db_path to get_state_dir and update StateDB usage. |
| packages/pivot/tests/execution/test_executor_worker.py | Update worker/executor tests to construct StateDB from state directory. |
| packages/pivot/tests/execution/test_executor.py | Update executor integration tests to use .pivot/ as state dir. |
| packages/pivot/tests/execution/test_execution_modes.py | Update tests that apply deferred writes to use the state directory. |
| packages/pivot/tests/engine/test_run_history.py | Update state path usage; currently includes redundant get_state_dir monkeypatching. |
| packages/pivot/tests/engine/test_engine.py | Update engine tests to open StateDB with state_dir. |
| packages/pivot/tests/conftest.py | Remove get_state_db_path monkeypatch from shared fixture. |
| packages/pivot/tests/cli/test_fingerprint_cli.py | Update CLI fingerprint tests to use .pivot/ as state dir. |
| packages/pivot/tests/cli/test_cli_init.py | Remove state.db expectation from .pivot/.gitignore template test. |
| packages/pivot/tests/cli/test_cli_history.py | Update history tests to open StateDB from .pivot/ directory. |
| packages/pivot/src/pivot/storage/state.py | Change StateDB.__init__ to accept state_dir and store LMDB under state_dir/state.lmdb. |
| packages/pivot/src/pivot/status.py | Open StateDB using config.get_state_dir() instead of removed get_state_db_path(). |
| packages/pivot/src/pivot/pipeline/pipeline.py | Update documentation strings to refer to state.lmdb / “state database”. |
| packages/pivot/src/pivot/fingerprint.py | Replace get_state_db_path() usage with get_state_dir() for StateDB access. |
| packages/pivot/src/pivot/explain.py | Fix generation-skip detection path by opening StateDB(state_dir, readonly=True) for checks. |
| packages/pivot/src/pivot/executor/worker.py | Use state_dir from stage_info directly for StateDB open calls. |
| packages/pivot/src/pivot/executor/core.py | Replace get_state_db_path() usage with get_state_dir() for verification hashing cache. |
| packages/pivot/src/pivot/executor/commit.py | Construct per-stage StateDB from stage state directory. |
| packages/pivot/src/pivot/executor/AGENTS.md | Update worker path derivation docs to reflect passing state_dir directly. |
| packages/pivot/src/pivot/engine/engine.py | Construct cached read-only StateDBs keyed by stage_state_dir rather than state.db path. |
| packages/pivot/src/pivot/config/io.py | Remove get_state_db_path() accessor. |
| packages/pivot/src/pivot/config/init.py | Stop exporting removed get_state_db_path. |
| packages/pivot/src/pivot/cli/remote.py | Open StateDB using config.get_state_dir(). |
| packages/pivot/src/pivot/cli/init.py | Remove state.db from .pivot/.gitignore template; keep state.lmdb/. |
| packages/pivot/src/pivot/cli/history.py | Open StateDB from config.get_state_dir() (remove state_db_path local var). |
| packages/pivot/src/pivot/cli/fingerprint.py | Open StateDB from io.get_state_dir() in pivot fingerprint reset. |
| docs/getting-started/quickstart.md | Update recommended project .gitignore entry to .pivot/state.lmdb/. |
| docs/concepts/pipelines.md | Update .pivot/ layout docs to show state.lmdb/ directory. |
| docs/concepts/caching.md | Update caching docs to refer to state.lmdb/. |
| docs/architecture/overview.md | Update architecture overview to refer to state.lmdb/ directory. |
Contributor
Author
|
@sjawhar Would you mind taking a look at this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
explain.pygated the O(1) generation skip check onstate.dbexisting — a file Pivot never creates. This madepivot status,pivot repro -n, andpivot repro -n -eextremely slow because every dependency was re-hashed from scratch.StateDB.__init__now accepts the state directory directly instead of a phantomstate.dbpath. Removedget_state_db_path()— callers useget_state_dir()directly.state.dbfrom the.gitignoretemplate since it was never a real file.state.lmdb/instead ofstate.db.Root cause
StateDBhas always used LMDB (state.lmdb/directory), but callers constructed astate.dbpath that was passed to the constructor, which then diddb_path.parent / "state.lmdb"to find the actual storage. No code ever created astate.dbfile.In
explain.py, anif state_db_path.exists():guard (added in #245) checked for this phantom file before attempting the fast generation-based skip detection. Since the file never existed, the fast path was always skipped, falling through to expensive per-file rehashing with no hash cache.Test plan
ruff checkcleanbasedpyrightclean (no new errors — 19 pre-existing errors, 444 pre-existing warnings unchanged)🤖 Generated with Claude Code