Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c170693
chore(porch): 1118 init pir
amrmelsayed Jun 29, 2026
c05a2e0
[PIR #1118] Plan draft
amrmelsayed Jun 29, 2026
ff78364
chore(porch): 1118 plan-approval gate-requested
amrmelsayed Jun 29, 2026
694a1cb
[PIR #1118] Plan revised: cut prune/forget, reshape builders, migrate…
amrmelsayed Jun 29, 2026
fa743dc
[PIR #1118] Plan: lock session_id exclusion boundary (defer to #1112)
amrmelsayed Jun 29, 2026
46d962d
[PIR #1118] Plan: strict one-off active-file migration + marker idemp…
amrmelsayed Jun 30, 2026
27a7250
[PIR #1118] Plan: reusable consolidate engine + afx db consolidate ma…
amrmelsayed Jun 30, 2026
1f2e8fa
[PIR #1118] Plan: lock gate decisions (cut cleanup cmds, accept resha…
amrmelsayed Jun 30, 2026
90de6e7
chore(porch): 1118 plan-approval gate-approved
amrmelsayed Jun 30, 2026
02c558f
chore(porch): 1118 implement phase-transition
amrmelsayed Jun 30, 2026
539dc93
[PIR #1118] Move state.db tables into global.db; reshape builders wor…
amrmelsayed Jun 30, 2026
de35709
[PIR #1118] Move state.db tables into global.db; reshape builders wor…
amrmelsayed Jun 30, 2026
d44fc70
[PIR #1118] Consolidation engine + boot one-off + afx db consolidate …
amrmelsayed Jun 30, 2026
3a2ee98
[PIR #1118] Update tests for single-DB model (state, lookup-builder, …
amrmelsayed Jun 30, 2026
d32c039
[PIR #1118] Consolidate engine tests + fix workspace derivation (last…
amrmelsayed Jun 30, 2026
d2ca86a
[PIR #1118] Update thread: implementation complete, suite green
amrmelsayed Jun 30, 2026
6209fd5
chore(porch): 1118 dev-approval gate-requested
amrmelsayed Jun 30, 2026
fc6f371
[PIR #1118] Clean up stale state.db references in state.ts comments
amrmelsayed Jul 1, 2026
97d6200
[PIR #1118] Plan: add migration testing recipe (dry-run/isolated appl…
amrmelsayed Jul 1, 2026
f8823e5
[PIR #1118] Replace introduced ternaries with if/else (no-ternary pre…
amrmelsayed Jul 1, 2026
65ed478
[PIR #1118] Dedupe workspace-path canonicalization into one leaf module
amrmelsayed Jul 1, 2026
dcf95fc
chore(porch): 1118 dev-approval gate-approved
amrmelsayed Jul 1, 2026
5d90f22
chore(porch): 1118 review phase-transition
amrmelsayed Jul 1, 2026
8c8875e
[PIR #1118] Review + retrospective; update arch (hot+cold) and lesson…
amrmelsayed Jul 1, 2026
b57aece
chore(porch): 1118 record PR #1127
amrmelsayed Jul 1, 2026
808733d
chore(porch): 1118 review build-complete
amrmelsayed Jul 1, 2026
d982857
[PIR #1118] Fix codex review findings: scope clearRuntime; idempotent…
amrmelsayed Jul 1, 2026
dbe527f
[PIR #1118] Review: record 3-way consult outcome + codex fixes
amrmelsayed Jul 1, 2026
6352451
chore(porch): 1118 pr gate-requested
amrmelsayed Jul 1, 2026
bee7498
[PIR #1118] Update thread: consult done, codex fixes, pr gate pending
amrmelsayed Jul 1, 2026
7f6ce33
[PIR #1118] Fix two more consolidation-fallout bugs (audit + codex it…
amrmelsayed Jul 1, 2026
74d994d
[PIR #1118] Review + iter2 rebuttal: record send.ts audit-find + dry-…
amrmelsayed Jul 1, 2026
6893a43
[PIR #1118] Update thread: cross-workspace audit + iter2 findings fixed
amrmelsayed Jul 1, 2026
14604d2
[PIR #1118] Actually remove migrate.test.ts (deletion was unstaged, b…
amrmelsayed Jul 1, 2026
2ff9784
[PIR #1118] Track iter1 rebuttal (consistency with iter2)
amrmelsayed Jul 1, 2026
c81b8f0
[PIR #1118] iter3 fixes: greedy .builders regex in send.ts + runBootC…
amrmelsayed Jul 1, 2026
f437f61
[PIR #1118] Review + iter3 rebuttal: record final consult outcome
amrmelsayed Jul 1, 2026
64219ae
[PIR #1118] Update thread: CI fix + iter3 final consult
amrmelsayed Jul 1, 2026
e2c9604
[PIR #1118] Test: consolidation tolerates a v11-but-not-v12 state.db …
amrmelsayed Jul 1, 2026
2dba9db
[PIR #1118] Review: document schema-version tolerance (v11-but-not-v1…
amrmelsayed Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
336 changes: 336 additions & 0 deletions codev/plans/1118-consolidate-state-db-tables-in.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Rebuttal — PIR #1118 review iteration 1

**claude**: APPROVE — no action needed.

**codex**: REQUEST_CHANGES — two findings, **both accepted and fixed** (commit `d9828577`).
No disagreement; codex was correct on both.

---

## Finding 1 — `clearRuntime()` wiped every workspace's builders (real regression)

> `state.ts:431-440` — `clearRuntime()` now deletes **all** `builders` rows from the shared
> `global.db`, not just the current workspace's. After #1118, stopping one workspace would wipe
> builder state for other workspaces too.

**Accepted — this was a real bug I missed.** Pre-#1118, `getDb()` returned the per-workspace
`state.db`, so an unscoped `DELETE FROM builders` only affected that workspace's file. After the
consolidation `getDb()` returns the shared `global.db`, so `afx workspace stop` (which calls
`clearRuntime()`) deleted **every** workspace's builders.

**Fix:**
- `clearRuntime(workspacePath: string)` now scopes the delete: `DELETE FROM builders WHERE
workspace_path = ?`. Signature changed to require the workspace.
- `commands/stop.ts` passes `workspacePath` (in scope from `config.workspaceRoot`) at both call
sites.
- `utils`/`annotations` (global, UUID-keyed, no `workspace_path`, and vestigial — no producers)
are intentionally **no longer** deleted by `clearRuntime`, to avoid the same cross-workspace
wipe. The full-nuke `clearState()` (no production callers) still clears them.

**Regression test** (`state.test.ts`): builders in workspaces A and B coexist in the shared DB;
`clearRuntime('/workspace/aaa')` removes only A's builder, B's survives.

## Finding 2 — `afx db consolidate` repeat-run not idempotent

> `db.ts:176-180` / `consolidate.ts:394-397` — re-running on the original path hard-fails once the
> source has been renamed, and re-running on an already archived file would rename it again
> instead of being a friendly no-op.

**Accepted.** The underlying engine (`applyMigration`) already no-ops on a missing source
(`migrated:false`, covered by a test), but the **CLI command** `dbConsolidate` called
`fatal(...)` on a missing path and had no guard against re-processing a `*.pre-merge-*` archive —
so the documented "friendly no-op on re-run" was true for the engine but not the command.

**Fix** (`commands/db.ts`):
- Missing source → friendly `logger.info("Nothing to consolidate … (already migrated?)")` and
`return` (exit 0), instead of `fatal`.
- Source path matching `*.pre-merge-*` → skip with a message (don't re-migrate + double-rename).
- Both guards return **before** opening `global.db`.

**Regression tests** (`db.test.ts`): `dbConsolidate` on a missing path and on an already-archived
`*.pre-merge-*` file each complete without throwing / hard-failing.

---

**Verification after fixes**: `pnpm build` ✓, full agent-farm suite ✓ (2017 passed, +3 new tests),
typecheck clean.

**Note**: PIR's verify is a single advisory pass (`max_iterations: 1`) — these fixes are not
re-consulted; they're surfaced to the human at the `pr` gate (and recorded in the PR body /
review file). Both findings are non-security correctness/UX fixes with regression coverage.
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Rebuttal — PIR #1118 review iteration 2 (+ manual cross-workspace audit)

Iteration 2 was an ad-hoc 3-way re-run (architect requested) after the iter-1 fixes, plus a
systematic audit of "what else assumed the per-file boundary?" (prompted by: are there other
bugs like the `clearRuntime` one now that we share one DB). Two more issues found; **both
fixed** (commit `7f6ce330`). No disagreement.

- **claude iter2**: APPROVE.
- **gemini iter2**: APPROVE.
- **codex iter2**: REQUEST_CHANGES — dry-run side-effect (finding 2 below).

## Finding 1 (audit-found, HIGH) — `send.ts` opened the retired `state.db`

`detectCurrentBuilderId()` (the `afx send` #1094 anti-spoofing path) opened
`<workspace>/.agent-farm/state.db` directly to resolve a builder's canonical id. After #1118
migrates + renames that file, `existsSync` fails and `afx send` from a worktree throws
`BuilderIdResolutionError`. This is the same direct-open pattern as `lookupBuilderSpawningArchitect`
and `overview.ts` (both fixed in the main implementation) — this third instance was missed, and
**all three consult models missed it too**; the manual audit caught it.

**Fix**: read `global.db` (`getGlobalDbPath()`, read-only) scoped by `workspace_path =
normalizeWorkspacePath(workspacePath)`, matching by worktree. The #1094 fail-loud contract is
preserved (throws, never a bare-name fallback). Test rewritten for the global.db model, including
a **same-id-different-workspace** scoping case (a builder with the same worktree tail in another
workspace must not match).

## Finding 2 (codex iter2, MEDIUM) — `afx db consolidate` dry-run not side-effect-free

`dbConsolidate` called `getGlobalDb()` before the `--apply` check; `getGlobalDb()` eagerly
initializes/migrates `global.db` (creates the file, runs migration v14). So a *preview* could
create/migrate the target DB — violating the plan's "dry-run … no writes."

**Fix**: dry-run opens `global.db` **read-only** (or an in-memory DB if it doesn't exist yet —
empty tables make every source row read as "new", the correct preview). Only `--apply` opens the
real read-write `getGlobalDb()` connection. New test asserts dry-run neither calls `getGlobalDb()`
nor creates the target file.

## Audit result (scope of the "shared-DB" bug class)

Confirmed the complete set of access paths that relied on per-file isolation:
- Unscoped state.ts reads/writes → all scoped (builder fns + `clearRuntime`).
- Direct `state.db` opens → three total: `lookupBuilderSpawningArchitect` ✓, `overview.ts` ✓,
and `send.ts` (this fix). Grep confirms no others remain.
- `clearState` → no production callers.
- `utils`/`annotations` unscoped in `loadState` → vestigial (no producers); noted as a
low-severity future cleanup, not fixed here.

**Verification**: `pnpm build` ✓, full agent-farm suite ✓ (2018 passed, +new tests), typecheck clean.
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Rebuttal — PIR #1118 review iteration 3 (final)

Final 3-way re-run against the settled code. **claude APPROVE**, **gemini COMMENT**,
**codex REQUEST_CHANGES** (two items). Both addressed (commit `c81b8f0d`).

## Finding 1 (codex) — `send.ts` `.builders` regexes used lazy first-match

`detectWorkspaceRoot()` and `detectCurrentBuilderId()` parsed the worktree with a lazy
`/^(.+?)\/\.builders\/…/`, which matches the **first** `/.builders/`. For a *nested* worktree
`<repo>/.builders/a/.builders/b` this resolves the outer builder — the same class the PR fixed
elsewhere with `lastIndexOf`.

**Reachability (per architect discussion)**: nesting only occurs via `afx spawn` from *inside* a
builder worktree, an explicitly-forbidden anti-pattern ("breaks everything", per CLAUDE.md). In
normal use (single `/.builders/`), lazy and greedy are identical — no user-facing impact. Fixed
anyway for consistency (the PR's whole theme is last-match `.builders` parsing) — it's a
one-character change and leaves the tree uniform.

**Fix**: greedy `.+` in both regexes; regression test asserting a nested worktree resolves the
inner builder. Also refreshed the stale `detectCurrentBuilderId` docstring (claude's minor
observation: it still referenced `state.db`/singleton `getDb()`; the code correctly reads
`global.db`).

## Finding 2 (codex) — missing `runBootConsolidation` coverage

`consolidate.test.ts` covered `applyMigration`/`isConsolidationDone` but not the actual boot
path or its strict-marker semantics.

**Fix**: added a `runBootConsolidation` suite (exercises the real `activeStateDbPath()` via a
temp workspace cwd): first-boot migrate + marker + source rename; marker-set → no-op (new active
`state.db` untouched); strict → marks done even when the active `state.db` is absent, then
no-ops on subsequent boots.

## gemini (COMMENT)

Advisory only, no blocking issues raised; noted the plan-fulfilment and the composite-PK
security contract as correctly handled.

**Verification**: `pnpm build` ✓, full agent-farm suite ✓ (2022 passed, +4 new tests), typecheck
clean, CI green.
29 changes: 29 additions & 0 deletions codev/projects/1118-consolidate-state-db-tables-in/status.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
id: '1118'
title: consolidate-state-db-tables-in
protocol: pir
phase: review
plan_phases: []
current_plan_phase: null
gates:
plan-approval:
status: approved
requested_at: '2026-06-29T23:26:42.680Z'
approved_at: '2026-06-30T06:35:22.421Z'
dev-approval:
status: approved
requested_at: '2026-06-30T07:12:02.675Z'
approved_at: '2026-07-01T06:50:38.564Z'
pr:
status: pending
requested_at: '2026-07-01T07:29:33.516Z'
iteration: 1
build_complete: true
history: []
started_at: '2026-06-29T23:21:05.933Z'
updated_at: '2026-07-01T07:29:33.518Z'
pr_history:
- phase: review
pr_number: 1127
branch: builder/pir-1118
created_at: '2026-07-01T06:55:19.034Z'
pr_ready_for_human: true
2 changes: 1 addition & 1 deletion codev/resources/arch-critical.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ and keeps the map in sync with arch.md's top-level sections. See codev/resources
- Two trees: codev/ = our instance, codev-skeleton/ = the template shipped to adopters. Mirror every framework change in BOTH.
- CLAUDE.md and AGENTS.md MUST stay byte-identical (same content, two tool ecosystems).
- Porch is a pure planner: it emits task JSON, Claude Code executes. Never hand-edit status.yaml.
- State lives in .agent-farm/state.db + ~/.agent-farm/global.db (single source of truth); one Tower on port 4100. Never modify state by hand.
- State lives in a single user-global ~/.agent-farm/global.db (Issue #1118 retired the per-workspace state.db; architect/builders keyed by workspace_path); one Tower on port 4100. Never modify state by hand.
- Worktrees in .builders/ are Agent-Farm-managed — never delete manually (use afx cleanup); run afx from the main workspace root only.
- Forge concept commands abstract the VCS provider — add a dedicated concept; don't bolt env flags onto a shared one.
- Two human gates (spec-approval, plan-approval) plus the pr gate; only humans transition conceived→specified and committed→integrated.
Expand Down
24 changes: 17 additions & 7 deletions codev/resources/arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ For debugging common issues, start here:
| **"Builder spawn fails"** | `packages/codev/src/agent-farm/commands/spawn.ts` → `upsertBuilder()` | Worktree creation, shellper session, role injection |
| **"Gate not notifying architect"** | `commands/porch/notify.ts` → `notifyArchitect()` | porch sends `afx send architect` directly at gate transitions (Spec 0108) |
| **"Consult hangs/fails"** | `packages/codev/src/commands/consult/index.ts` | CLI availability (gemini/codex/claude), role file loading |
| **"State inconsistency"** | `packages/codev/src/agent-farm/state.ts` | SQLite at `.agent-farm/state.db` |
| **"State inconsistency"** | `packages/codev/src/agent-farm/state.ts` | SQLite in the user-global `~/.agent-farm/global.db` (Issue #1118); rows scoped by `workspace_path` |
| **"Port conflicts"** | `packages/codev/src/agent-farm/db/schema.ts` | Global registry at `~/.agent-farm/global.db` |
| **"Init/adopt not working"** | `packages/codev/src/commands/{init,adopt}.ts` | Skeleton copy, template processing |

Expand Down Expand Up @@ -90,7 +90,7 @@ tail -f ~/.agent-farm/tower.log

**These MUST remain true - violating them will break the system:**

1. **State Consistency**: `.agent-farm/state.db` is the single source of truth for builder/util state. Never modify it manually.
1. **State Consistency**: the user-global `~/.agent-farm/global.db` is the single source of truth for architect/builder/util state (Issue #1118 retired the per-workspace `.agent-farm/state.db`; rows are disambiguated by `workspace_path`). Never modify it manually.

2. **Single Tower Port**: All projects are served through Tower on port 4100. Per-project port blocks were removed in Spec 0098. Terminal sessions and workspace metadata are tracked in `~/.agent-farm/global.db`.

Expand Down Expand Up @@ -394,11 +394,21 @@ const baseEnv = {

### State Management

Agent Farm uses SQLite for ACID-compliant state persistence with two databases:

#### Local State (`.agent-farm/state.db`)

Stores the current session's state with tables for `architect`, `builders`, `utils`, and `annotations`. See `packages/codev/src/agent-farm/db/schema.ts` for the full schema.
Agent Farm uses SQLite for ACID-compliant state persistence. Issue #1118 consolidated
everything into **one user-global database** (`~/.agent-farm/global.db`); the per-workspace
`.agent-farm/state.db` file was retired (its cwd-dependent location caused architect state to
"disappear" across Tower restarts). `getDb()` and `getGlobalDb()` both return the single global
connection.

#### Dashboard State tables (in `global.db`)

`architect`, `builders`, `utils`, and `annotations` (formerly in `state.db`) live in `global.db`.
`architect` and `builders` are keyed by a composite `(workspace_path, id)` so the same name/id
can exist in multiple workspaces (a builder id is `<protocol>-<issueNumber>`, unique per repo but
reused across repos); `utils`/`annotations` are UUID-keyed. `state.ts` reads/writes them via
`getDb()`; the one-time on-disk migration of legacy `state.db` files lives in
`db/consolidate.ts` (marker-gated boot one-off + the manual `afx db consolidate <path>`). See
`packages/codev/src/agent-farm/db/schema.ts` (`GLOBAL_SCHEMA`, migration v14) for the full schema.

#### State Operations (from `state.ts`)

Expand Down
1 change: 1 addition & 0 deletions codev/resources/lessons-learned.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Generalizable wisdom extracted from review documents, ordered by impact. Updated

## Architecture

- [From #1118] When a subsystem's *scope* outgrows its storage *location*, relocate the storage to match — don't patch the scope with a column and leave the location lying. `state.db` was born per-workspace; when Tower became a system-wide singleton its scope became user-global, but the file stayed at a cwd-dependent `<workspace>/.agent-farm/state.db`. Bugfix #826 patched the *symptom* (a `workspace_path` column so cross-workspace rows could coexist in one file) while leaving the *location* wrong, so which file Tower opened still depended on its start-cwd — architect state "disappeared" after any restart from a different directory. The real fix was to move the tables into the already-user-global `global.db` and make `getDb()` return it. Corollary caught mid-implementation: moving a table into a *shared* DB forces you to re-examine its primary key — `builders` was keyed by `id` alone (safe when each workspace had its own file), but ids are `<protocol>-<issueNumber>`, unique per repo yet **reused across repos**, so the shared table needs a composite `(workspace_path, id)` PK (mirroring what #826 did for `architect`) or same-id builders from two repos collide — and that collision is security-relevant (it feeds the `afx send` spoofing check). The issue's "tables move as-is" framing was true for 3 of 4 tables but not the one whose key assumed per-file isolation. Migration technique that generalized: an **upsert-if-newer** engine (`ON CONFLICT DO UPDATE … WHERE excluded.started_at > existing`) is the same code for the empty-target one-off and the non-empty satellite import, and reading legacy source files **defensively** (`PRAGMA table_info` + synthesizing the new key column) handles heterogeneous historical schema versions without running the old migration ladder.
- [From #832] A recovery mechanism that keys off a process holding a resource *open* (a file descriptor, a lock) is only as sound as that assumption — verify it empirically *before* building on it. #832's first design disambiguated sibling architects (which share one cwd) by correlating each one's process subtree to the session jsonl it "holds open" via `lsof`. But Claude appends to its session jsonl and **closes** it between writes: `lsof -p <pid>` against a live architect showed **zero** jsonl fds, so the correlation never matched — every apparent success was the *unrelated* sole-architect newest-by-mtime fallback. One captured `lsof` (raw data, not reasoning) falsified the whole subsystem in minutes; it had passed unit tests because the tests only exercised the fallback. Corollary: don't delete a *working* self-recovery path to rebuild it as a bridge that can't cover the hard case. The branch had removed #830's sole-architect jsonl resume for `main` and replaced it with a backfill that — because pre-existing siblings are fundamentally unrecoverable (no `--session-id` arg, no open fd, only a jsonl filename with no reliable pid bridge) — could only ever rescue `main`, which the original fallback already did losslessly. Restoring the existing fallback (gated to the unambiguous sole-architect case) and deleting the entire bridge was strictly simpler and lost nothing.
- [From #929] When abstracting per-CLI (or per-provider) behavior behind a provider interface, audit **every** call site that builds a CLI invocation — including resume/restart paths, not just the obvious fresh-launch path. Spec 591's harness abstraction routed fresh-launch role injection through the provider but left the `--resume` seam reading Claude's session store unconditionally; a codex/gemini architect (or resumed builder) with any stale Claude `.jsonl` then built `<cmd> --resume <claude-uuid>` and shellper restart-looped to death. The bug only fires on the resume branch (fresh launches were already correct), which is exactly why "builders already prove the path" didn't cover it — resume is a *different* call site. Fix: an optional capability (`buildResume`) that returns the resume invocation in both Node-argv and shell-escaped-fragment forms (mirroring `buildRoleInjection`/`buildScriptRoleInjection`), so no CLI-specific string survives at the call site and only the harness that has a cwd-keyed session store opts in (others return null → fresh launch).
- [From 1052] Shared *shape* is not shared *substance* — resist extracting on resemblance. The VSCode replay buffer-and-flush and the web client's `flushInitialBuffer` do the same conceptual thing (hold replay → paint once at the settled size), but their triggers and bodies diverge: web uses a fixed 500ms delay entangled with `FitAddon`/`ScrollController`; VSCode uses a `setDimensions` debounce + PTY-resize + paced `writeChunked`. The only common part is a ~10-line "accumulate then emit once" skeleton; the valuable logic (trigger policy, flush body) differs. Extracting would couple two independently-evolving strategies behind one leaky name — a net negative at two call sites. Contrast the primitives that *are* genuinely identical and correctly centralized in core: `reconnect-policy` and `escape-buffer` (the dashboard's `escapeBuffer.ts` is a 5-line re-export of core's, not a copy). Corollary, painfully relearned here: a claim of "duplication" from an *import path* (`../lib/escapeBuffer.js`) is not evidence — read the file before filing a dedup; it was a re-export shim all along.
Expand Down
Loading
Loading