[WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls by jorgeraad · Pull Request #699 · pensarai/apex

jorgeraad · 2026-05-01T20:52:56Z

⚠️ Not for merge. This is a temporary diagnostic patch to confirm what's
happening during the multi-hour Bedrock stream stalls on jraad-deploy. Once
we have signal from the next stall and identify the root cause, this branch
will either be reverted or replaced with a properly-scoped instrumentation
PR (Tier 2/3 from the design notes). Do not merge as-is.

Adds always-on structured logging around Bedrock streamText calls so we can finally see what's happening during the multi-hour recon stalls on jraad-deploy.

Two pieces:

A wrapFetchWithBedrockLogs helper in instrumentation.ts that emits per-fetch lifecycle events — start, headers, first byte, done, body errors, signal abort — each tagged with a callId and ageMs since fetch start. It now also owns the streaming-fetch-timeout composition so its signal_abort log fires when the 15-min backstop trips (today the caller never sees that signal directly).
Retry-decision logs in ai.ts that bypass the silent: true flag — which OffensiveSecurityAgent hardcodes for CLI quietness, also hiding the same warnings we need server-side. Covers rate-limit retries, the new stream-idle-resume path, context-length compaction / summarization, the streamText onError handler, and unrecoverable stream errors. Each carries a logicalCallId so a multi-retry chain joins to a single logical call in CloudWatch Insights.

CLI behavior is unchanged — the existing console.warn calls stay gated by silent. The new events go to a separate stdout channel: [apex.instrumentation] {json}.

Rebased onto canary to pick up the new withIdleTimeout work; my retry-log block now also fires on the new apex.retry.stream_idle_resume path. Bundles the unmerged feat/surface-integration (#664) commits as well — the PR diff will include those until #664 lands.

Smoke-tested locally against a hung Bun server and confirmed all six lifecycle events fire, including signal_abort on the composed timeout.

Plan

Bump the apex submodule in console to this branch on jraad-deploy only.
Reproduce the stall (or wait for the next one).
Pull [apex.instrumentation] lines from CloudWatch and confirm whether retry chains fire / whether the timeout signal aborts / whether bytes stop flowing.
Use the answer to scope the real fix; revert this PR.

Adds the npm dep that subsequent integration tasks will import from. Workspace-hoisted at the console worktree root; apex's own bun.lock is unchanged because the parent console workspace owns the lockfile.

…classifier Defines ConsolidatedEndpoint (one record per (file, path) with method[]) and classifyEndpoint() implementing the page-vs-API rules from the design's section 1.4. Next.js page/route convention drives PAGE classification; v1 deliberately leaves Rails/Django/FastAPI/Spring view-rendering routes as their HTTP method (per design — fallback path covers them). 7/7 unit tests cover App Router pages, Pages Router, route handlers, Server Actions, WebSocket, Express, and multi-method consolidation.

Single workflow entry point per app. Calls surface.map() with includeInternal:false, applies fallback gate (no frameworks OR zero endpoints), consolidates per-(file,path) and runs the page-vs-API classifier. Returns a discriminated union: { mode: 'fallback', reason } or { mode: 'surface', endpoints, frameworks }. 10 unit tests cover consolidation (multi-method same path, distinct files same path), the two fallback conditions, and end-to-end classification on a synthetic MapResult.

… path Replaces per-app pages+apiEndpoints CodeAgent pair on the surface-driven path with a single per-app enrichment CodeAgent that receives the full deterministic endpoint list and emits one document_asset per endpoint. Per design Phase 1.2: agent told NOT to grep for new routes (surface already enumerated them); pre-fills authRequired from auth signals; preserves existing description/pentestObjectives on unchanged endpoints when the session has prior state. Adds export keywords to WHITEBOX_CODE_AGENT_SYSTEM_PROMPT, AppInfoSchema, and DiscoverySummarySchema in whiteboxAttackSurface.ts so the new module can reuse them — no logic changes to the workflow itself. 11 unit tests cover the objective-builder prompt content (numbered endpoint list, 'Do NOT re-discover routes' invariant, document_asset asks, page vs api method rendering).

Replaces the per-app pages+apiEndpoints CodeAgent pair with a per-app dispatch: surface-driven via mapAppWithSurface + runEnrichmentAgent when surface supports the framework, falling back to the existing two-agent flow when the fallback gate fires. Cloud-resource apps still route to the existing cloudResourceEndpoints agent — surface is HTTP-route-focused (per design Non-Goals). Phases 1 (apps discovery), 1.5 (app.json), 3 (assets read), 4 (risk scoring), 5 (assembly) byte-for-byte unchanged. runIncrementalWhiteboxAttackSurfaceWorkflow untouched (Phase 3 of the design is a follow-up). Subagent events preserved: enrich-${app.name} for the surface path, pages-${app.name} / apiEndpoints-${app.name} for fallback, cloudResourceEndpoints-${app.name} for cloud apps. Per-app log line indicates which path was taken.

- Update bun.lock to include @pensar/surface@0.1.1 (workspace install in the development context hoisted to the parent lockfile, leaving apex's own lockfile out of sync — fix the standalone install). - Prettier --write across the seven new/modified files. No logic changes.

Without this, Phase 1's appsAgent has document_endpoint in its default tool registry, and the shared system prompt heavily instructs every agent to call document_endpoint per route. Phase 1 over-reaches: after documenting apps via document_app, it keeps going and tries to call document_endpoint for every route — long before Phase 2 (where surface runs) gets a chance. Adds excludeTools: ['document_endpoint'] symmetric to Phase 2's existing excludeTools: ['document_app'] in spawnDiscoveryAgent + the enrichment agent. Each phase now has the right tool surface for its job: apps-only for Phase 1, endpoints-only for Phase 2 (surface or fallback).

…e 1 objective The previous fix removed document_endpoint from Phase 1's tool registry, but the agent worked around it by calling document_app on individual endpoints (e.g. `document_app GET /api/products`) since the shared system prompt's 'document every route' directive is so strong. Add an explicit exclusion bullet to the IMPORTANT block ('Individual API routes, web pages, or HTTP endpoints — endpoint enumeration is handled by a separate phase') and reinforce in the closing line. The agent now has matching tool-level + objective-level constraints; no improvising.

The previous fix removed document_endpoint from Phase 1's tool registry and added an objective-level exclusion, but the agent kept improvising (calling document_app on individual routes) because the shared WHITEBOX_CODE_AGENT_SYSTEM_PROMPT still contains a heavy ## document_endpoint section + Working Approach references. Even with the tool unavailable, the prompt was strong enough that the agent treated routes as documentable and reached for the closest-shaped tool. Add WHITEBOX_APPS_DISCOVERY_SYSTEM_PROMPT — a stripped variant of the shared prompt with all document_endpoint mentions removed. The prompt now tells Phase 1 explicitly that endpoint enumeration is a separate phase's job. Phase 2 agents (pages/apiEndpoints/cloudResource/enrichment) keep using the original WHITEBOX_CODE_AGENT_SYSTEM_PROMPT — they legitimately need that guidance. After this: Phase 1 has matching tool-level + objective-level + system- prompt-level constraints, all consistent. No improvising surface left.

…_endpoint shape Two bugs the user surfaced from a real Coffee Shop run: 1. The enrichment agent was using WHITEBOX_CODE_AGENT_SYSTEM_PROMPT, which heavily instructs every agent to 'orient first, list files, grep, search-then-read, be thorough — discover N routes.' The enrichment agent was reading that as discovery framing and ignored its 'list above is complete' objective, exploring the repo from scratch and re-finding routes surface had already enumerated. 2. The objective told the agent to call `document_asset` with a nested `details` block. That tool name was renamed to `document_endpoint` (with a flat schema) in canary, and the agent was hunting for a tool that doesn't exist — falling back to default discovery behavior in confusion. Add WHITEBOX_ENRICHMENT_SYSTEM_PROMPT — purpose-built for enrichment. It tells the agent: 'you have a deterministic list, read just the handler at file:line, document_endpoint with these flat fields, do not list_files or grep for new routes.' Drops the discovery Working Approach in favor of enrichment-only guidance. Update buildEnrichmentObjective to: - Use the actual document_endpoint tool name + flat schema (routePath/method/file/line/handler/authRequired/endpointType/riskLevel), matching the schema in offSecAgent/tools/documentEndpoint.ts. - Pre-derive endpointType per entry (web-endpoint for PAGE, otherwise api-endpoint). - Drop the pentestObjectives ask — document_endpoint generates them automatically via threatModelGenerator on the tool side. - End with the explicit count: 'must equal the number above: ${N}'. Phase 2 fallback agents (pages/apiEndpoints/cloudResource) keep using WHITEBOX_CODE_AGENT_SYSTEM_PROMPT — they are doing discovery and need that framing.

Refactors the surface-driven path so each endpoint gets its own enrichment subagent, surfacing in the subagent view as a distinct row (e.g. "Coffee Shop: POST /api/admin/diagnostics") rather than a single "enrich-Coffee Shop" agent making N document_endpoint calls. Why per-endpoint: - Matches the original issue #662 design verbatim. - Each agent has tiny scope: read one handler, document one endpoint, return. Zero cross-endpoint coupling. - The cross-endpoint reasoning advantage of per-app enrichment is moot now that pentestObjectives are auto-generated by document_endpoint (via generateThreatModelForEndpoint) — the agent only writes description + riskLevel + auth refinement, none of which need app-wide context. - Token caps can't truncate: each agent's output is bounded by one endpoint. - Subagent view becomes self-documenting (one row per endpoint with method + path in the display name). Changes: - enrichmentAgent.ts: - runEnrichmentAgent now takes EndpointEnrichmentInput (single endpoint) and produces subagentId 'enrich-<app-slug>-<route-slug>' + display name '<app.name>: <method> <path>'. - New runAppEnrichment wrapper fans out N agents via runWithBoundedConcurrency at ENRICHMENT_CONCURRENCY=5. - Hard-excludes list_files/grep/document_app on the per-endpoint agent so the model can't fall back to discovery behavior. - buildEnrichmentObjective rewritten for single-endpoint scope with explicit Workflow section ('1. read_file the handler 2. document_endpoint once 3. response'). - whiteboxAttackSurface.ts: Phase 2 surface-driven branch swaps runEnrichmentAgent for runAppEnrichment. No other workflow changes. - enrichmentAgent.test.ts: rewrites the prompt-builder test for per-endpoint shape + adds cases for PAGE single-method serialization, multi-method JSON-array serialization, and 'no auth signals' prefill.

Phase 1's apps-discovery agent typically returns 'app/' (or similar sub-directory) for single-app repos where the routes live there. Pointing surface at that subdir misses the parent's package.json (or requirements.txt, go.mod, etc.) — surface's framework detection returns 'frameworks: []', the gate triggers fallback, and the user sees the legacy two-agent discovery path even though surface would have worked from the repo root. Reproducible against pensarai/coffee-shop: map('/coffee-shop') → frameworks=['nextjs'], endpoints=8 map('/coffee-shop/app') → frameworks=[], endpoints=0 ← bug Add findDependencyRoot(appPath, repoRoot): walks up from appPath toward repoRoot looking for the nearest directory containing a recognized dep manifest. Used by mapAppWithSurface to pick the right scan root before invoking surface. For single-app repos (Coffee Shop with location='app/'), the walk finds the repo's package.json and scans from there. For monorepos where each package has its own package.json, the walk stops at the package directory immediately — no over-broadening. Bounded by repoRoot so we never escape the project. Workflow now passes codebasePath as the second arg to bound the walk. 7 new unit tests cover: walk-up to parent, deep-nested walk-up, monorepo package boundary, root with own dep file, no-walk-past-root, and graceful fallback when no dep file exists. End-to-end smoke against the real coffee-shop layout: was returning fallback, now returns surface mode with 5 consolidated endpoints from 8 raw rows.

…apex classifier Bumps @pensar/surface to ~0.2.1, which now emits the route's categorical role (api / page / action / websocket) on EndpointInfo directly. Apex's hand-written file-pattern classifier becomes redundant — the kind field is the source of truth. Changes: - Bump @pensar/surface to ~0.2.1 (skips deprecated 0.2.0). - Delete src/core/integrations/surface/classifier.{ts,test.ts}. - Add 'kind: EndpointKind' to ConsolidatedEndpoint; re-export EndpointKind. - Replace post-consolidation classification with a one-line ternary in the integration helper: kind==='page' ? method=['PAGE'] : pass-through. - Simplify EnrichmentEndpoint to a pass-through alias for ConsolidatedEndpoint (drops the classifiedMethod + isPage workaround fields). - Drop the bridging map in workflow Phase 2 — surfaceResult.endpoints flow straight to runAppEnrichment. - Update tests: fixtures use kind directly; assert kind==='page' produces method=['PAGE']. Coverage gain: Next.js page routes (app/page.tsx, pages/*.tsx) now flow end-to-end as kind=page → endpointType=web-endpoint. Smoke against coffee-shop returns 10 endpoints (8 api + 2 page) where the previous classifier shim returned only 8.

…hten dedup - Drop pass-through `EnrichmentEndpoint` alias; use `ConsolidatedEndpoint` directly - Remove unused `HttpMethod`/`EndpointKind` re-exports from surface/types - Replace O(n) handler-string dedup in consolidateBySameRoute with a Set - Strip narrating WHAT comments, version tags, and design-doc section refs

…lication - Move AppInfoSchema, AppsDiscoveryResultSchema, DiscoverySummarySchema and their inferred types into agents/specialized/whiteboxAttackSurface/types.ts. enrichmentAgent.ts no longer reaches up into workflows/ for them. - Move the three workflow-phase system prompts (apps-discovery, enrichment, general discovery) into agents/specialized/whiteboxAttackSurface/prompts.ts and compose them from shared snippets so a tweak in tool guidance hits every prompt. - Rename WHITEBOX_CODE_AGENT_SYSTEM_PROMPT -> WHITEBOX_DISCOVERY_SYSTEM_PROMPT (it's the fallback + incremental prompt now, not "the" code-agent prompt). - Drop the AppMetadata interface in favor of Pick<AppInfo, ...> + a toAppMetadata helper, eliminating three near-duplicate inline metadata shapes. - mapAppWithSurface is no longer async (surface.map is sync, the speculative future-proofing comment is gone) and FallbackDecision is now a discriminated union so reason is typed only on the fallback branch. - Drop the method=["PAGE"] substitution from the integration layer. The surface integration is a pure pass-through; kind stays authoritative for page/api classification. Apex's method="PAGE" storage convention is applied once at the document_endpoint write boundary in buildEnrichmentObjective. - Trim field-level prose in the enrichment objective; the tool schema already documents each field.

… cross-app leakage mapAppWithSurface used to return every endpoint surface found anywhere under the climbed-up scan root, with no filter restricting them to app.location. When phase 1 set `location` to a path without its own dep manifest (e.g. an SST IaC file under `infra/`), the climb walked to the repo root, surface scanned the whole monorepo, and every such app received the same union of routes — observed on a Console recon where four SST-defined apps each got 89-122 endpoints sourced from `console/`, `packages/`, AND `infra/`. New decision sequence in mapAppWithSurface: 1. Force fallback when app.location is the repo root — phase 1 didn't disambiguate; a no-op scope filter would re-attribute everything to one app. 2. Try a narrow `map(appPath)` first. When appPath has its own manifest this is the correct, fastest path AND avoids surface's global (method, path) dedup eating routes across siblings. 3. Otherwise climb to a parent manifest, scan there, then filter endpoints whose `file` resolves under appPath's subtree. The file-equality branch handles `app.location` being a file (e.g. `infra/api.ts`); the prefix-startsWith branch handles directories. `scopeEndpointsToApp` is exported for direct testing. Adds a regression test and four sibling cases covering: shared root manifest, file-path location, own-manifest narrow scan, repo-root fallback, and frameworks-detected-but-empty-scoped fallback.

@message

…alls Adds an `instrumentation.ts` module that emits structured stdout JSON for: - Bedrock fetch lifecycle: start, headers, first_byte, done, body_error, body_cancel, signal_abort. Each event carries a per-fetch `callId` and an `ageMs` since fetch start, so stalls can be located precisely (e.g. "got headers + first byte then went silent for 14 minutes before the timeout signal fired"). - Apex retry decisions: rate-limit retry (`apex.retry.rate_limit`), context-length compaction / summarization, streamText `onError` handler, and unrecoverable stream errors. Each carries the `logicalCallId` for the enclosing streamResponse so multi-retry chains are joinable in CloudWatch Insights. These logs bypass the `silent: true` flag (which OffensiveSecurityAgent hardcodes for CLI/TUI quietness) so server contexts always have visibility into retry storms and hung Bedrock streams. The user-facing console.warn calls remain gated by `silent` for backward-compatible CLI behavior. The `wrapFetchWithBedrockLogs` wrapper now also owns the `composeSignal` step, so its `signal_abort` log fires for the composed timeout (the caller never sees it directly otherwise). Format: `[apex.instrumentation] {json}` — greppable; CloudWatch Insights can parse with `parse @message "[apex.instrumentation] *" as raw_json`. Why now: production recon stalls have been silent for 3+ hours with the process alive, no errors, no logs. We don't know whether the 15-min streaming timeout fires, whether retries cycle, or where bytes stop flowing. This module surfaces all three.

jorgeraad changed the title ~~feat(ai): structured logs to diagnose Bedrock stream stalls~~ [WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls May 1, 2026

Test and others added 18 commits May 1, 2026 17:47

feat(integrations/surface): add @pensar/surface@~0.1.1 dependency

282dd24

Adds the npm dep that subsequent integration tasks will import from. Workspace-hoisted at the console worktree root; apex's own bun.lock is unchanged because the parent console workspace owns the lockfile.

chore(whitebox-recon): bump per-app enrichment concurrency to 10

bb0613a

jorgeraad force-pushed the feat/instrumentation-tier1 branch from b48835a to 5a29ef8 Compare May 1, 2026 21:48

jorgeraad changed the base branch from feat/surface-integration to canary May 1, 2026 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls#699

[WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls#699
jorgeraad wants to merge 18 commits intocanaryfrom
feat/instrumentation-tier1

jorgeraad commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jorgeraad commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jorgeraad commented May 1, 2026 •

edited

Loading