Skip to content

afx workspace remove-architect: guard against orphaning in-flight builders + offer reassignment to main (revisit Spec 786 OQ-A) #1117

Description

@amrmelsayed

Problem

afx workspace remove-architect <name> succeeds unconditionally when the target architect has in-flight builders attached to it (spawnedByArchitect == <name>). The architect's state.db.architect row is deleted; the builders' state.db.builders rows are left untouched — so each orphan builder's spawnedByArchitect column still points at the removed name even though no live architect by that name is registered.

This is the Spec 786 OQ-A resolution, documented in the existing handler:

"Removing a sibling with in-flight builders is permitted (per OQ-A) — the builders' subsequent afx send architect calls fall back to main via tower-messages.ts:336."

That decision was defensible at the time the spec landed because routing was the only consequence. Since then, three changes have made the stale-pointer surface visibly worse:

  1. Agents view shipped (PR Agents view: architect-aware group-by axis (VSCode) #1106 / Issue vscode: Agents view with a 3-way group-by axis (stage / area / architect) + conversational Add Architect #1104) — the architect-axis grouping mode renders orphan builders under a header named after the removed architect. No live architect by that name; the header lies about the workspace roster.
  2. Click-to-open on architect headers (PR [AIR #1108] vscode: open architect terminal on Agents-view group-header click #1109 / Issue vscode: clicking an architect group header in the Agents view should open that architect's terminal (parity with the builder-row click affordance) #1108) — clicking that stale-owner header calls codev.openArchitectTerminal('<dead-name>'), which fails gracefully but is a dead affordance.
  3. Per-architect session UUIDs (PR Multi-architect conversation resume via persisted per-architect session ID (#832) #1116 / Issue Multi-architect conversation resume: disambiguate via per-architect session UUID #832) — adding an architect with the same name as a previously-removed one now creates a fresh session with its own UUID. The orphan builders silently re-attach to that new architect, which has zero context for any of those builders' history. Identity collision with no warning at the moment of re-add.

Verified against source

  • state.removeArchitect (packages/codev/src/agent-farm/state.ts): single DELETE FROM architect WHERE workspace_path = ? AND id = ?. The builders table is untouched.
  • removeArchitect Tower handler (packages/codev/src/agent-farm/servers/tower-instances.ts): no builder-count check; refuses only main (workspace-defining) and unknown names.
  • workspace-remove-architect.ts CLI: refuses empty name + main; otherwise dispatches.
  • Routing fallback (packages/codev/src/agent-farm/servers/tower-messages.ts:336): builder-sender + architect target → look up spawnedByArchitect, if registered route there, else fall back to main. Routing-only; never back-writes the DB row.

Consequences of the keep-stale-pointer design

Surface Symptom
state.db consistency spawnedByArchitect = '<removed-name>' rows linger, querying by spawnedByArchitect returns orphans pointing at no live architect
Agents view (architect-axis) Stale-owner group header for a removed architect
Click-to-open on stale header codev.openArchitectTerminal('<dead-name>') fails gracefully but the affordance is dead
afx send architect:<name> from non-builder NOT_FOUND until the name is re-added
Re-adding the same name Silent identity collision: orphan builders attribute to a fresh architect with zero context for their history; the new architect's brief never mentions them
Conversation history The removed architect's codev/state/<name>.md working memory is gone — orphan builders can't recover what their owning architect knew about their plan / gates / scope

Proposed fix

Refuse the removal by default when any builder has spawnedByArchitect == <name>, and offer two opt-in escape hatches at the CLI:

  • --reassign-to <target> (default suggestion: main) — reassigns each orphan builder's spawnedByArchitect to <target> (must be an existing architect; refuses if <target> is not registered), then completes the removal.
  • --force — proceeds with the removal and leaves the orphan builders pointing at the dead name (today's behaviour, preserved as an opt-out for cases where the user explicitly wants the old semantics).

Default (no flag) refuses with an actionable message listing the orphan builders + the two recovery commands:

Refusing to remove architect 'vscode': 2 in-flight builder(s) own this architect.
  - builder-pir-1234 (#1234, implement phase)
  - builder-air-1235 (#1235, review phase)

Choose one:
  afx workspace remove-architect vscode --reassign-to main
  afx workspace remove-architect vscode --force   (leaves builders with a stale owner)

Why this shape

  • Doesn't trap the user (the Spec 786 OQ-A concern): --force preserves the old behaviour as an explicit opt-out.
  • Default is safe: orphan builders are surfaced before they can become invisible problems.
  • Cleans the stale pointer in the common case: reassigning to main (or another named architect) closes the routing degradation, the Agents-view stale-header rendering, and the identity-collision-on-re-add path in one DB write.
  • Adds visibility: even users who choose --force see the count + IDs of what they're orphaning.

Acceptance criteria

  • afx workspace remove-architect <name> with no flags refuses when ≥1 builder has spawnedByArchitect == <name>. Error names each orphan builder (id + phase) and surfaces both --reassign-to and --force as remediation paths.
  • --reassign-to <target> writes spawnedByArchitect = <target> for every orphan builder before removing the architect. Refuses if <target> is not a currently-registered architect, or equals <name>.
  • --force proceeds with today's behaviour: removes the architect, leaves builders with stale spawnedByArchitect. Logs a WARN naming the orphan count.
  • A new state.reassignBuildersToArchitect(workspacePath, fromName, toName) helper performs the bulk update atomically (single transaction) so a partial failure doesn't leave half the builders reassigned.
  • Tower-side handler accepts a reassignTo?: string option in its removeArchitect call and dispatches the new state helper before the architect-row delete.
  • The CLI flow validates --reassign-to target's existence client-side before calling Tower, so the user gets a clear "no such target architect" message instead of a server-side error.
  • Unit tests cover: refuses on orphans without flag; reassign-to-main happy path; reassign target validation; force-with-orphans logs WARN; no-orphans path unchanged (no extra prompts, no log spam).
  • Behavioural test asserts the Agents view's stale-owner header is gone after a --reassign-to main removal (the orphan builders should now group under MAIN).

Out of scope

  • Conversation-history transfer from the removed architect to main (the removed architect's codev/state/<name>.md notes are lost; revisiting that would be a separate Spec 1090-adjacent issue).
  • The "what happens when afx tower start doesn't auto-launch a workspace's persisted roster" gap discussed separately; orthogonal.
  • An interactive Quick Pick in VS Code for "pick a target architect to reassign to" — could be a follow-up, but the CLI fix is the load-bearing path.
  • Changing state.db schema. spawnedByArchitect is already nullable / mutable text.

Protocol

PIR. This reverses a Spec 786 design decision (OQ-A) and changes user-visible CLI behaviour for a command they've been using; the design-shift wants a plan-gate sanity check + a dev-gate run-through of all three flow paths (default-refuse, --reassign-to main, --force) before PR.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/towerArea: Tower server / agent farm CLI

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions