v0.42.54.0 feat(facts): durable extraction — recovery phase + intra-page resume + enforced deadline#2502
Open
danwiggins wants to merge 2 commits into
Open
v0.42.54.0 feat(facts): durable extraction — recovery phase + intra-page resume + enforced deadline#2502danwiggins wants to merge 2 commits into
danwiggins wants to merge 2 commits into
Conversation
…+ enforced deadline - realtime_absorb_recovery cycle phase (default on, bounded): recovers pages whose real-time fact extraction was dropped on process exit, by re-running the pipeline inline (idempotent via existing dedup) and tombstoning the ingest_log failure record. No new table. - cursor-only-on-confirmed-write: a swallowed insert/extract failure no longer advances the resume cursor past unwritten facts. - intra-page resume: per-page checkpoint carries a row_num watermark, advances per segment on confirmed commit; scoped delete-orphans preserves the committed prefix so a large page makes monotonic forward progress. - enforced per-source wall-clock deadline + budgets aligned to the autopilot job window. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fact extraction could silently drop on a real-world install. Real-time extraction (
runFactsBackstopqueue mode) runs fire-and-forget on the in-memoryFactsQueue; when the writing process exits before that settles,background-workshutdown aborts the in-flight chat call. The page persists with zero facts and nothing retries it. The durable catch-up that should rescue it (conversation_facts_backfill) only handles chat-shaped pages, is opt-in/off, and — even when on — wiped the whole page at the start of every run, so a page too large for one cycle's budget could never finish.Observed on a live brain: meeting/Slack pages landing with an aborted
facts:absorbrow iningest_logand never recovering; the autopilot cycle reportingfacts_consolidated: 0.Fix
Three changes:
realtime_absorb_recoverycycle phase (default-on, bounded). Treats unresolvedfacts:absorbfailure rows iningest_logas a durable backlog (no new table), re-runs the pipeline inline (which carries the existing 0.95 cosine dedup, so it's idempotent), and appends afacts:absorb-recoveredtombstone on success. Bounded by page cap (25), cost cap ($0.25), and a wall-clock deadline (240s); kill switchcycle.realtime_absorb_recovery.enabled=false. Recovers narrative pages the chat-shaped backfill can't parse.Cursor-only-on-confirmed-write + intra-page resume in
extract-conversation-facts.ts. The per-page checkpoint now carries arow_numwatermark and advances per segment on confirmed commit;deleteOrphanFactsForPageis scoped to the uncommitted tail (row_num >= watermark). A page that can't finish in one cycle's budget makes monotonic forward progress instead of wipe→re-extract→re-exhaust forever. A swallowed insert/extract failure no longer advances the cursor past unwritten facts. Legacy checkpoints (no watermark) force one safe full re-extract on upgrade.Enforced per-source wall-clock deadline + budget alignment. The per-source cap was read from config but never passed to the worker, and the brain-wide cap only checks between sources — so a single-source brain had no wall-clock ceiling and a long drain could blow the autopilot job timeout. Now passed + enforced; defaults lowered (4 min/source, 6 min total) to sit under the ~600s job window. Also fixed a
fullyProcessedoff-by-one that skipped the terminal row (livelock) when a page had exactlysegmentLimitsegments.Review + verification
segmentLimitlivelock) — both fixed with regression tests.segmentLimit-exact completion, and 6 recovery-phase tests (recover + tombstone, idempotency, per-page-failure leaves un-tombstoned, page-gone, kill switch, default-on). Phase-count fixtures updated.Engine-parity: no new engine methods; the backlog query is portable
executeRaw.