Skip to content

fix: remove phantom state.db path and fix explain.py skip detection#455

Open
tbroadley wants to merge 4 commits intosjawhar:mainfrom
tbroadley:fix/remove-state-db-sentinel
Open

fix: remove phantom state.db path and fix explain.py skip detection#455
tbroadley wants to merge 4 commits intosjawhar:mainfrom
tbroadley:fix/remove-state-db-sentinel

Conversation

@tbroadley
Copy link
Contributor

@tbroadley tbroadley commented Mar 17, 2026

Summary

  • Bug fix: explain.py gated the O(1) generation skip check on state.db existing — a file Pivot never creates. This made pivot status, pivot repro -n, and pivot repro -n -e extremely slow because every dependency was re-hashed from scratch.
  • Cleanup: StateDB.__init__ now accepts the state directory directly instead of a phantom state.db path. Removed get_state_db_path() — callers use get_state_dir() directly.
  • Removed state.db from the .gitignore template since it was never a real file.
  • Updated docs to reference state.lmdb/ instead of state.db.

Root cause

StateDB has always used LMDB (state.lmdb/ directory), but callers constructed a state.db path that was passed to the constructor, which then did db_path.parent / "state.lmdb" to find the actual storage. No code ever created a state.db file.

In explain.py, an if state_db_path.exists(): guard (added in #245) checked for this phantom file before attempting the fast generation-based skip detection. Since the file never existed, the fast path was always skipped, falling through to expensive per-file rehashing with no hash cache.

Test plan

  • All 806 affected tests pass (4 skipped)
  • ruff check clean
  • basedpyright clean (no new errors — 19 pre-existing errors, 444 pre-existing warnings unchanged)

🤖 Generated with Claude Code

StateDB has always used LMDB (state.lmdb/ directory), but callers
passed a "state.db" path that was never created. This caused a bug
in explain.py where `if state_db_path.exists()` always returned False,
skipping the O(1) generation-based skip detection and falling through
to expensive per-file rehashing — making `pivot status`, `pivot repro -n`,
and `pivot repro -n -e` extremely slow.

Changes:
- StateDB.__init__ now takes state_dir directly instead of deriving it
  from a phantom db_path.parent
- Remove get_state_db_path() — callers use get_state_dir() directly
- Remove existence guard in explain.py so generation skip always runs
- Remove state.db from .gitignore template
- Update docs to reference state.lmdb/ instead of state.db

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tbroadley tbroadley force-pushed the fix/remove-state-db-sentinel branch from ac24dcc to c4d43ff Compare March 17, 2026 19:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tbroadley tbroadley marked this pull request as ready for review March 17, 2026 20:01
Copilot AI review requested due to automatic review settings March 17, 2026 20:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a major performance regression where pivot status / pivot repro -n / pivot repro -n -e always missed the O(1) generation-based skip path due to checking for a non-existent state.db file, and cleans up the codebase to consistently treat state as an LMDB directory (state.lmdb/) under a state directory.

Changes:

  • Refactor StateDB to accept a state_dir directly and remove get_state_db_path() in favor of get_state_dir().
  • Update explain.py to run generation-based skip detection via StateDB (instead of being gated on the phantom state.db path).
  • Update tests, CLI init gitignore template, and docs to reference state.lmdb/ and the state directory API.

Reviewed changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/pivot/tests/test_skip_detection_integration.py Update StateDB construction to pass state_dir directly.
packages/pivot/tests/test_run_history.py Update StateDB construction to use the state directory.
packages/pivot/tests/test_run_cache_lock_update.py Update test state path to .pivot/ directory instead of .pivot/state.db.
packages/pivot/tests/test_directory_out_integration.py Clear run cache by removing state.lmdb/ directory rather than phantom files.
packages/pivot/tests/storage/test_state_db_timeout.py Update timeout tests to pass tmp_path as the state directory.
packages/pivot/tests/storage/test_state.py Update all state storage tests to use a state directory and state.lmdb/ pathing.
packages/pivot/tests/storage/test_cache.py Update cache tests to open StateDB with a directory.
packages/pivot/tests/remote/test_transfer.py Update remote transfer integration tests to construct StateDB(state_dir).
packages/pivot/tests/pipeline/test_pipeline.py Update wording in docstring to remove state.db reference.
packages/pivot/tests/integration/test_unified_execution.py Remove get_state_db_path monkeypatch usage.
packages/pivot/tests/integration/test_lazy_resolution.py Update wording in docstring to remove state.db reference.
packages/pivot/tests/fingerprint/test_no_fingerprint.py Update StateDB construction to pass state_dir.
packages/pivot/tests/fingerprint/test_fingerprint.py Switch monkeypatching from get_state_db_path to get_state_dir and update StateDB usage.
packages/pivot/tests/execution/test_executor_worker.py Update worker/executor tests to construct StateDB from state directory.
packages/pivot/tests/execution/test_executor.py Update executor integration tests to use .pivot/ as state dir.
packages/pivot/tests/execution/test_execution_modes.py Update tests that apply deferred writes to use the state directory.
packages/pivot/tests/engine/test_run_history.py Update state path usage; currently includes redundant get_state_dir monkeypatching.
packages/pivot/tests/engine/test_engine.py Update engine tests to open StateDB with state_dir.
packages/pivot/tests/conftest.py Remove get_state_db_path monkeypatch from shared fixture.
packages/pivot/tests/cli/test_fingerprint_cli.py Update CLI fingerprint tests to use .pivot/ as state dir.
packages/pivot/tests/cli/test_cli_init.py Remove state.db expectation from .pivot/.gitignore template test.
packages/pivot/tests/cli/test_cli_history.py Update history tests to open StateDB from .pivot/ directory.
packages/pivot/src/pivot/storage/state.py Change StateDB.__init__ to accept state_dir and store LMDB under state_dir/state.lmdb.
packages/pivot/src/pivot/status.py Open StateDB using config.get_state_dir() instead of removed get_state_db_path().
packages/pivot/src/pivot/pipeline/pipeline.py Update documentation strings to refer to state.lmdb / “state database”.
packages/pivot/src/pivot/fingerprint.py Replace get_state_db_path() usage with get_state_dir() for StateDB access.
packages/pivot/src/pivot/explain.py Fix generation-skip detection path by opening StateDB(state_dir, readonly=True) for checks.
packages/pivot/src/pivot/executor/worker.py Use state_dir from stage_info directly for StateDB open calls.
packages/pivot/src/pivot/executor/core.py Replace get_state_db_path() usage with get_state_dir() for verification hashing cache.
packages/pivot/src/pivot/executor/commit.py Construct per-stage StateDB from stage state directory.
packages/pivot/src/pivot/executor/AGENTS.md Update worker path derivation docs to reflect passing state_dir directly.
packages/pivot/src/pivot/engine/engine.py Construct cached read-only StateDBs keyed by stage_state_dir rather than state.db path.
packages/pivot/src/pivot/config/io.py Remove get_state_db_path() accessor.
packages/pivot/src/pivot/config/init.py Stop exporting removed get_state_db_path.
packages/pivot/src/pivot/cli/remote.py Open StateDB using config.get_state_dir().
packages/pivot/src/pivot/cli/init.py Remove state.db from .pivot/.gitignore template; keep state.lmdb/.
packages/pivot/src/pivot/cli/history.py Open StateDB from config.get_state_dir() (remove state_db_path local var).
packages/pivot/src/pivot/cli/fingerprint.py Open StateDB from io.get_state_dir() in pivot fingerprint reset.
docs/getting-started/quickstart.md Update recommended project .gitignore entry to .pivot/state.lmdb/.
docs/concepts/pipelines.md Update .pivot/ layout docs to show state.lmdb/ directory.
docs/concepts/caching.md Update caching docs to refer to state.lmdb/.
docs/architecture/overview.md Update architecture overview to refer to state.lmdb/ directory.

@tbroadley
Copy link
Contributor Author

@sjawhar Would you mind taking a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants