-
Notifications
You must be signed in to change notification settings - Fork 2
Description
TL;DR
The path-free engine migration changed artifact identity from file paths to
ArtifactIdentity(producer, key)but left CLI commands broken — they treat identity keys as file paths. This plan fixes all broken commands, makes the CLI accept identity strings instead of file paths for stage outputs, and exposes WorkspaceStore as the sole identity→path resolver.
Core Objective: Make all CLI commands work correctly with identity-based artifact references. Users type train or train:model (identity strings) for stage outputs, and file paths only for .pvt-tracked files.
Must Have:
- All CLI commands that reference stage outputs work with identity strings
- WorkspaceStore exposed to CLI as lazy singleton for identity→path resolution
--allmode works: merged Pipeline("all") → WorkspaceStore(pipeline_name="all") → paths underdata/all/- Display format: both identity and resolved path (e.g.,
train:model → data/pipeline/train/model.csv) - Shell completion suggests identity strings
- Zero basedpyright errors, all tests pass
Must NOT Have:
- No backwards-compatible path input for stage outputs (breaking change, pre-alpha)
- No new abstraction layers beyond WorkspaceStore exposure
- No changes to compose API, store.py internals, or lock file format
What's Broken
| File | Function | Bug |
|---|---|---|
cli/checkout.py:32-62 |
_get_stage_output_info() |
canonicalize_artifact_path(identity_key(...)) treats identity as path |
cli/verify.py:82-96 |
_get_stage_lock_hashes() |
Same broken pattern |
remote/sync.py:111-168 |
_get_file_hash_from_stages() |
Compares identity_key(out_path) to absolute path — never matches |
show/data.py:41-56 |
_data_rel_path() |
Returns identity_key as "relative path" (no file extension) |
show/common.py:72-87 |
extract_output_hashes_from_lock() |
Falls back to key/display when path field missing |
cli/targets.py:151-187 |
resolve_output_paths() |
Conflates identity keys with paths |
--all Mode Design
When --all is passed, discover_pipeline(all_pipelines=True) merges all discovered pipelines into Pipeline("all", root=project_root) via include(). Stages with name collisions are prefixed (e.g., eval/train). The merged pipeline is stored in Click context. get_workspace_store() lazily builds WorkspaceStore(pipeline_name="all", ...), so output paths resolve to data/all/{stage}.csv. This is consistent with how Engine already works — it uses pipeline.name for its own WorkspaceStore (engine.py:1357).
Execution Structure
| Wave | Tasks | Parallelizable? |
|---|---|---|
| Wave 1 | Task 1 (Store + target parsing) | Foundation — must go first |
| Wave 2 | Tasks 2-5 (checkout, verify/sync, show/diff, get/completion) | All parallel |
| Wave 3 | Task 6 (cleanup + E2E verification) | Sequential |
All tasks follow TDD: write failing tests first → verify they fail correctly → write minimal code → verify they pass.
Task 1: WorkspaceStore CLI integration + identity target parsing
RED — Write failing tests (packages/pivot/tests/cli/test_identity_resolution.py):
test_parse_identity_target_single:"train"→ArtifactIdentity("train", None)test_parse_identity_target_with_key:"train:model"→ArtifactIdentity("train", "model")test_resolve_cli_target_stage: registered stage name →IdentityTargettest_resolve_cli_target_specific_key:"train:model"→ specific output reftest_resolve_cli_target_pvt: file path with.pvt→PvtTargettest_resolve_cli_target_unknown_raises: unknown target → clear errortest_get_workspace_store_with_pipeline: returnsWorkspaceStoretest_get_workspace_store_no_pipeline: returnsNonetest_get_workspace_store_all_mode: merged pipeline →pipeline_name="all"test_workspace_store_resolve_display_path: identity→path correct for data/metric/plot/directory
GREEN — Implementation:
cli/helpers.py: Addget_workspace_store() -> WorkspaceStore | None(lazy, cached, usespipeline.name,input_bindings={})storage/store.py: Addresolve_display_path(ref) -> Path(delegates to_resolve_path)cli/targets.py: Addparse_identity_target(target) -> ArtifactIdentityandresolve_cli_target(target, registry, pvt_lookup) -> IdentityTarget | PvtTarget
References: cli/helpers.py:36-67, cli/targets.py:26-118, storage/store.py:126-252, types.py:261-334
Task 2: Fix pivot checkout command
RED — Write failing tests (packages/pivot/tests/cli/test_checkout_identity.py):
test_checkout_identity_target_restores_to_store_path:"train"→ file atdata/{pipeline}/train.csvtest_checkout_all_no_targets_restores_all_outputs: all outputs via Store pathstest_checkout_pvt_target_still_works:.pvtpaths still worktest_checkout_unknown_identity_raises: helpful errortest_checkout_directory_output:DIRECTORYtag restores correctlytest_checkout_all_mode_uses_merged_pipeline_paths:--all→data/all/train.csvtest_get_stage_output_info_uses_store_not_canonicalize: core bug regression test
GREEN — Implementation:
- Rewrite
_get_stage_output_info(): Store-resolved paths instead ofcanonicalize_artifact_path(identity_key(...)) - Update
_validate_and_build_files(): identity-based target matching - Update
_dedupe_targets(): identity dedup - Update help text and display messages
References: cli/checkout.py:32-427, storage/store.py:183-204
Task 3: Fix pivot verify + remote sync (push/pull/fetch)
RED — Write failing tests:
test_verify_with_identity_keyed_locks: verify passes with identity keystest_verify_all_mode: works across merged pipelinestest_get_file_hash_from_stages_identity_match: core bug in sync.py:131test_get_target_hashes_identity_target:"train:model"→ correct hashestest_normalize_cli_targets_identity_passthrough: identity targets pass throughtest_normalize_cli_targets_pvt_path_normalized:.pvtpaths still normalized
GREEN — Implementation:
- Fix
_get_stage_lock_hashes()in verify.py: identity-based filtering - Fix
_get_file_hash_from_stages()in sync.py: identity key matching - Fix
_normalize_cli_targets()in remote.py: identity-first passthrough - Fix
get_target_hashes()in sync.py: identity parse before path resolution
References: cli/verify.py:63-101, remote/sync.py:111-250, cli/remote.py:44-74
Task 4: Fix show/data module + pivot diff command
RED — Write failing tests:
test_data_rel_path_returns_store_resolved_path:"data/eval/train.csv"not"train"test_format_detection_from_artifact_ref: usesArtifactRef.format, not file extensiontest_extract_output_hashes_uses_identity_key: identity key as canonical lookuptest_diff_with_identity_target:"pivot diff train"compares correct file
GREEN — Implementation:
- Fix
_data_rel_path(): Store-resolved path with correct extension - Fix format detection: use
ArtifactRef.formatdirectly - Fix
extract_output_hashes_from_lock(): canonical identity key lookup - Fix
diffcommand: identity-based target matching
References: show/data.py:41-56, 558-613, show/common.py:72-87, cli/data.py:35-166
Task 5: Fix pivot get/restore + shell completion + display helpers
RED — Write failing tests:
test_resolve_targets_stage_identity: Store-derived destination pathstest_resolve_targets_pvt_path: existing path-based logic workstest_entry_display_uses_identity_and_store_path: Store-resolved displaytest_complete_identities_includes_stage_keys: suggeststrain:modelfor multi-outputtest_format_identity_display_with_path:"train:model → data/eval/train/model.csv"test_resolve_targets_to_stages_identity_string:"train:model"→ producer "train"test_resolve_output_paths_uses_store_paths: actual paths, not identity keys
GREEN — Implementation:
- Fix
resolve_targets()in restore.py - Fix
_entry_display() - Update shell completion for
stage:keysuggestions - Add
format_identity_display()helper - Fix
resolve_targets_to_stages()andresolve_output_paths()
References: storage/restore.py:133-213, cli/completion.py:72-89, cli/targets.py:87-277
Task 6: Cleanup + full verification
- Remove all
canonicalize_artifact_path(identity_key(...))calls - Remove
canonicalize_artifact_pathif no legitimate uses remain - Quality checks:
ruff format,ruff check,basedpyright - Full test suite:
pytest packages/pivot/tests packages/pivot-tui/tests -n auto - E2E:
pivot repro --all(twice),pivot verify,pivot checkout --only-missing - Verify TUI still works
Success Criteria
uv run ruff format . && uv run ruff check . # Expected: clean
uv run basedpyright # Expected: 0 errors
uv run pytest packages/pivot/tests packages/pivot-tui/tests -n auto # Expected: all pass
pivot repro --all && pivot repro --all # Expected: 2nd run cached
pivot verify # Expected: all pass
pivot checkout train --only-missing # Expected: restores correctlyFull plan with detailed test code and QA scenarios: .sisyphus/plans/cli-identity-resolution.md