[WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls#699
Draft
[WIP / debug-only] feat(ai): structured logs to diagnose Bedrock stalls#699
Conversation
Adds the npm dep that subsequent integration tasks will import from. Workspace-hoisted at the console worktree root; apex's own bun.lock is unchanged because the parent console workspace owns the lockfile.
…classifier Defines ConsolidatedEndpoint (one record per (file, path) with method[]) and classifyEndpoint() implementing the page-vs-API rules from the design's section 1.4. Next.js page/route convention drives PAGE classification; v1 deliberately leaves Rails/Django/FastAPI/Spring view-rendering routes as their HTTP method (per design — fallback path covers them). 7/7 unit tests cover App Router pages, Pages Router, route handlers, Server Actions, WebSocket, Express, and multi-method consolidation.
Single workflow entry point per app. Calls surface.map() with
includeInternal:false, applies fallback gate (no frameworks OR zero
endpoints), consolidates per-(file,path) and runs the page-vs-API
classifier. Returns a discriminated union: { mode: 'fallback', reason }
or { mode: 'surface', endpoints, frameworks }.
10 unit tests cover consolidation (multi-method same path, distinct files
same path), the two fallback conditions, and end-to-end classification on
a synthetic MapResult.
… path Replaces per-app pages+apiEndpoints CodeAgent pair on the surface-driven path with a single per-app enrichment CodeAgent that receives the full deterministic endpoint list and emits one document_asset per endpoint. Per design Phase 1.2: agent told NOT to grep for new routes (surface already enumerated them); pre-fills authRequired from auth signals; preserves existing description/pentestObjectives on unchanged endpoints when the session has prior state. Adds export keywords to WHITEBOX_CODE_AGENT_SYSTEM_PROMPT, AppInfoSchema, and DiscoverySummarySchema in whiteboxAttackSurface.ts so the new module can reuse them — no logic changes to the workflow itself. 11 unit tests cover the objective-builder prompt content (numbered endpoint list, 'Do NOT re-discover routes' invariant, document_asset asks, page vs api method rendering).
Replaces the per-app pages+apiEndpoints CodeAgent pair with a per-app
dispatch: surface-driven via mapAppWithSurface + runEnrichmentAgent
when surface supports the framework, falling back to the existing
two-agent flow when the fallback gate fires.
Cloud-resource apps still route to the existing cloudResourceEndpoints
agent — surface is HTTP-route-focused (per design Non-Goals).
Phases 1 (apps discovery), 1.5 (app.json), 3 (assets read), 4 (risk
scoring), 5 (assembly) byte-for-byte unchanged.
runIncrementalWhiteboxAttackSurfaceWorkflow untouched (Phase 3 of the
design is a follow-up).
Subagent events preserved: enrich-${app.name} for the surface path,
pages-${app.name} / apiEndpoints-${app.name} for fallback,
cloudResourceEndpoints-${app.name} for cloud apps.
Per-app log line indicates which path was taken.
- Update bun.lock to include @pensar/surface@0.1.1 (workspace install in the development context hoisted to the parent lockfile, leaving apex's own lockfile out of sync — fix the standalone install). - Prettier --write across the seven new/modified files. No logic changes.
Without this, Phase 1's appsAgent has document_endpoint in its default tool registry, and the shared system prompt heavily instructs every agent to call document_endpoint per route. Phase 1 over-reaches: after documenting apps via document_app, it keeps going and tries to call document_endpoint for every route — long before Phase 2 (where surface runs) gets a chance. Adds excludeTools: ['document_endpoint'] symmetric to Phase 2's existing excludeTools: ['document_app'] in spawnDiscoveryAgent + the enrichment agent. Each phase now has the right tool surface for its job: apps-only for Phase 1, endpoints-only for Phase 2 (surface or fallback).
…e 1 objective
The previous fix removed document_endpoint from Phase 1's tool registry,
but the agent worked around it by calling document_app on individual
endpoints (e.g. `document_app GET /api/products`) since the shared system
prompt's 'document every route' directive is so strong.
Add an explicit exclusion bullet to the IMPORTANT block ('Individual API
routes, web pages, or HTTP endpoints — endpoint enumeration is handled
by a separate phase') and reinforce in the closing line. The agent now
has matching tool-level + objective-level constraints; no improvising.
The previous fix removed document_endpoint from Phase 1's tool registry and added an objective-level exclusion, but the agent kept improvising (calling document_app on individual routes) because the shared WHITEBOX_CODE_AGENT_SYSTEM_PROMPT still contains a heavy ## document_endpoint section + Working Approach references. Even with the tool unavailable, the prompt was strong enough that the agent treated routes as documentable and reached for the closest-shaped tool. Add WHITEBOX_APPS_DISCOVERY_SYSTEM_PROMPT — a stripped variant of the shared prompt with all document_endpoint mentions removed. The prompt now tells Phase 1 explicitly that endpoint enumeration is a separate phase's job. Phase 2 agents (pages/apiEndpoints/cloudResource/enrichment) keep using the original WHITEBOX_CODE_AGENT_SYSTEM_PROMPT — they legitimately need that guidance. After this: Phase 1 has matching tool-level + objective-level + system- prompt-level constraints, all consistent. No improvising surface left.
…_endpoint shape
Two bugs the user surfaced from a real Coffee Shop run:
1. The enrichment agent was using WHITEBOX_CODE_AGENT_SYSTEM_PROMPT,
which heavily instructs every agent to 'orient first, list files,
grep, search-then-read, be thorough — discover N routes.' The
enrichment agent was reading that as discovery framing and ignored
its 'list above is complete' objective, exploring the repo from
scratch and re-finding routes surface had already enumerated.
2. The objective told the agent to call `document_asset` with a
nested `details` block. That tool name was renamed to
`document_endpoint` (with a flat schema) in canary, and the
agent was hunting for a tool that doesn't exist — falling back
to default discovery behavior in confusion.
Add WHITEBOX_ENRICHMENT_SYSTEM_PROMPT — purpose-built for enrichment.
It tells the agent: 'you have a deterministic list, read just the
handler at file:line, document_endpoint with these flat fields, do not
list_files or grep for new routes.' Drops the discovery Working Approach
in favor of enrichment-only guidance.
Update buildEnrichmentObjective to:
- Use the actual document_endpoint tool name + flat schema
(routePath/method/file/line/handler/authRequired/endpointType/riskLevel),
matching the schema in offSecAgent/tools/documentEndpoint.ts.
- Pre-derive endpointType per entry (web-endpoint for PAGE, otherwise
api-endpoint).
- Drop the pentestObjectives ask — document_endpoint generates them
automatically via threatModelGenerator on the tool side.
- End with the explicit count: 'must equal the number above: ${N}'.
Phase 2 fallback agents (pages/apiEndpoints/cloudResource) keep using
WHITEBOX_CODE_AGENT_SYSTEM_PROMPT — they are doing discovery and need
that framing.
Refactors the surface-driven path so each endpoint gets its own enrichment subagent, surfacing in the subagent view as a distinct row (e.g. "Coffee Shop: POST /api/admin/diagnostics") rather than a single "enrich-Coffee Shop" agent making N document_endpoint calls. Why per-endpoint: - Matches the original issue #662 design verbatim. - Each agent has tiny scope: read one handler, document one endpoint, return. Zero cross-endpoint coupling. - The cross-endpoint reasoning advantage of per-app enrichment is moot now that pentestObjectives are auto-generated by document_endpoint (via generateThreatModelForEndpoint) — the agent only writes description + riskLevel + auth refinement, none of which need app-wide context. - Token caps can't truncate: each agent's output is bounded by one endpoint. - Subagent view becomes self-documenting (one row per endpoint with method + path in the display name). Changes: - enrichmentAgent.ts: - runEnrichmentAgent now takes EndpointEnrichmentInput (single endpoint) and produces subagentId 'enrich-<app-slug>-<route-slug>' + display name '<app.name>: <method> <path>'. - New runAppEnrichment wrapper fans out N agents via runWithBoundedConcurrency at ENRICHMENT_CONCURRENCY=5. - Hard-excludes list_files/grep/document_app on the per-endpoint agent so the model can't fall back to discovery behavior. - buildEnrichmentObjective rewritten for single-endpoint scope with explicit Workflow section ('1. read_file the handler 2. document_endpoint once 3. response'). - whiteboxAttackSurface.ts: Phase 2 surface-driven branch swaps runEnrichmentAgent for runAppEnrichment. No other workflow changes. - enrichmentAgent.test.ts: rewrites the prompt-builder test for per-endpoint shape + adds cases for PAGE single-method serialization, multi-method JSON-array serialization, and 'no auth signals' prefill.
Phase 1's apps-discovery agent typically returns 'app/' (or similar
sub-directory) for single-app repos where the routes live there.
Pointing surface at that subdir misses the parent's package.json
(or requirements.txt, go.mod, etc.) — surface's framework detection
returns 'frameworks: []', the gate triggers fallback, and the user
sees the legacy two-agent discovery path even though surface would
have worked from the repo root.
Reproducible against pensarai/coffee-shop:
map('/coffee-shop') → frameworks=['nextjs'], endpoints=8
map('/coffee-shop/app') → frameworks=[], endpoints=0 ← bug
Add findDependencyRoot(appPath, repoRoot): walks up from appPath
toward repoRoot looking for the nearest directory containing a
recognized dep manifest. Used by mapAppWithSurface to pick the
right scan root before invoking surface.
For single-app repos (Coffee Shop with location='app/'), the walk
finds the repo's package.json and scans from there. For monorepos
where each package has its own package.json, the walk stops at the
package directory immediately — no over-broadening. Bounded by
repoRoot so we never escape the project.
Workflow now passes codebasePath as the second arg to bound the walk.
7 new unit tests cover: walk-up to parent, deep-nested walk-up,
monorepo package boundary, root with own dep file, no-walk-past-root,
and graceful fallback when no dep file exists. End-to-end smoke against
the real coffee-shop layout: was returning fallback, now returns
surface mode with 5 consolidated endpoints from 8 raw rows.
…apex classifier
Bumps @pensar/surface to ~0.2.1, which now emits the route's categorical
role (api / page / action / websocket) on EndpointInfo directly. Apex's
hand-written file-pattern classifier becomes redundant — the kind field
is the source of truth.
Changes:
- Bump @pensar/surface to ~0.2.1 (skips deprecated 0.2.0).
- Delete src/core/integrations/surface/classifier.{ts,test.ts}.
- Add 'kind: EndpointKind' to ConsolidatedEndpoint; re-export EndpointKind.
- Replace post-consolidation classification with a one-line ternary in
the integration helper: kind==='page' ? method=['PAGE'] : pass-through.
- Simplify EnrichmentEndpoint to a pass-through alias for ConsolidatedEndpoint
(drops the classifiedMethod + isPage workaround fields).
- Drop the bridging map in workflow Phase 2 — surfaceResult.endpoints flow
straight to runAppEnrichment.
- Update tests: fixtures use kind directly; assert kind==='page' produces
method=['PAGE'].
Coverage gain: Next.js page routes (app/page.tsx, pages/*.tsx) now flow
end-to-end as kind=page → endpointType=web-endpoint. Smoke against
coffee-shop returns 10 endpoints (8 api + 2 page) where the previous
classifier shim returned only 8.
…hten dedup - Drop pass-through `EnrichmentEndpoint` alias; use `ConsolidatedEndpoint` directly - Remove unused `HttpMethod`/`EndpointKind` re-exports from surface/types - Replace O(n) handler-string dedup in consolidateBySameRoute with a Set - Strip narrating WHAT comments, version tags, and design-doc section refs
…lication - Move AppInfoSchema, AppsDiscoveryResultSchema, DiscoverySummarySchema and their inferred types into agents/specialized/whiteboxAttackSurface/types.ts. enrichmentAgent.ts no longer reaches up into workflows/ for them. - Move the three workflow-phase system prompts (apps-discovery, enrichment, general discovery) into agents/specialized/whiteboxAttackSurface/prompts.ts and compose them from shared snippets so a tweak in tool guidance hits every prompt. - Rename WHITEBOX_CODE_AGENT_SYSTEM_PROMPT -> WHITEBOX_DISCOVERY_SYSTEM_PROMPT (it's the fallback + incremental prompt now, not "the" code-agent prompt). - Drop the AppMetadata interface in favor of Pick<AppInfo, ...> + a toAppMetadata helper, eliminating three near-duplicate inline metadata shapes. - mapAppWithSurface is no longer async (surface.map is sync, the speculative future-proofing comment is gone) and FallbackDecision is now a discriminated union so reason is typed only on the fallback branch. - Drop the method=["PAGE"] substitution from the integration layer. The surface integration is a pure pass-through; kind stays authoritative for page/api classification. Apex's method="PAGE" storage convention is applied once at the document_endpoint write boundary in buildEnrichmentObjective. - Trim field-level prose in the enrichment objective; the tool schema already documents each field.
… cross-app leakage
mapAppWithSurface used to return every endpoint surface found anywhere
under the climbed-up scan root, with no filter restricting them to
app.location. When phase 1 set `location` to a path without its own
dep manifest (e.g. an SST IaC file under `infra/`), the climb walked
to the repo root, surface scanned the whole monorepo, and every such
app received the same union of routes — observed on a Console recon
where four SST-defined apps each got 89-122 endpoints sourced from
`console/`, `packages/`, AND `infra/`.
New decision sequence in mapAppWithSurface:
1. Force fallback when app.location is the repo root — phase 1 didn't
disambiguate; a no-op scope filter would re-attribute everything
to one app.
2. Try a narrow `map(appPath)` first. When appPath has its own
manifest this is the correct, fastest path AND avoids surface's
global (method, path) dedup eating routes across siblings.
3. Otherwise climb to a parent manifest, scan there, then filter
endpoints whose `file` resolves under appPath's subtree. The
file-equality branch handles `app.location` being a file (e.g.
`infra/api.ts`); the prefix-startsWith branch handles directories.
`scopeEndpointsToApp` is exported for direct testing.
Adds a regression test and four sibling cases covering: shared root
manifest, file-path location, own-manifest narrow scan, repo-root
fallback, and frameworks-detected-but-empty-scoped fallback.
…alls
Adds an `instrumentation.ts` module that emits structured stdout JSON for:
- Bedrock fetch lifecycle: start, headers, first_byte, done, body_error,
body_cancel, signal_abort. Each event carries a per-fetch `callId` and an
`ageMs` since fetch start, so stalls can be located precisely (e.g. "got
headers + first byte then went silent for 14 minutes before the timeout
signal fired").
- Apex retry decisions: rate-limit retry (`apex.retry.rate_limit`),
context-length compaction / summarization, streamText `onError` handler,
and unrecoverable stream errors. Each carries the `logicalCallId` for the
enclosing streamResponse so multi-retry chains are joinable in CloudWatch
Insights.
These logs bypass the `silent: true` flag (which OffensiveSecurityAgent
hardcodes for CLI/TUI quietness) so server contexts always have visibility
into retry storms and hung Bedrock streams. The user-facing console.warn
calls remain gated by `silent` for backward-compatible CLI behavior.
The `wrapFetchWithBedrockLogs` wrapper now also owns the `composeSignal`
step, so its `signal_abort` log fires for the composed timeout (the caller
never sees it directly otherwise).
Format: `[apex.instrumentation] {json}` — greppable; CloudWatch Insights can
parse with `parse @message "[apex.instrumentation] *" as raw_json`.
Why now: production recon stalls have been silent for 3+ hours with the
process alive, no errors, no logs. We don't know whether the 15-min
streaming timeout fires, whether retries cycle, or where bytes stop
flowing. This module surfaces all three.
b48835a to
5a29ef8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds always-on structured logging around Bedrock streamText calls so we can finally see what's happening during the multi-hour recon stalls on jraad-deploy.
Two pieces:
A
wrapFetchWithBedrockLogshelper ininstrumentation.tsthat emits per-fetch lifecycle events — start, headers, first byte, done, body errors, signal abort — each tagged with acallIdandageMssince fetch start. It now also owns the streaming-fetch-timeout composition so itssignal_abortlog fires when the 15-min backstop trips (today the caller never sees that signal directly).Retry-decision logs in
ai.tsthat bypass thesilent: trueflag — whichOffensiveSecurityAgenthardcodes for CLI quietness, also hiding the same warnings we need server-side. Covers rate-limit retries, the new stream-idle-resume path, context-length compaction / summarization, the streamTextonErrorhandler, and unrecoverable stream errors. Each carries alogicalCallIdso a multi-retry chain joins to a single logical call in CloudWatch Insights.CLI behavior is unchanged — the existing
console.warncalls stay gated bysilent. The new events go to a separate stdout channel:[apex.instrumentation] {json}.Rebased onto
canaryto pick up the newwithIdleTimeoutwork; my retry-log block now also fires on the newapex.retry.stream_idle_resumepath. Bundles the unmergedfeat/surface-integration(#664) commits as well — the PR diff will include those until #664 lands.Smoke-tested locally against a hung Bun server and confirmed all six lifecycle events fire, including
signal_aborton the composed timeout.Plan
jraad-deployonly.[apex.instrumentation]lines from CloudWatch and confirm whether retry chains fire / whether the timeout signal aborts / whether bytes stop flowing.