Skip to content

refactor: split engine serve bootstrap from CLI#322

Closed
tonyxiao wants to merge 747 commits intomainfrom
tx/engine-bin-split
Closed

refactor: split engine serve bootstrap from CLI#322
tonyxiao wants to merge 747 commits intomainfrom
tx/engine-bin-split

Conversation

@tonyxiao
Copy link
Copy Markdown
Collaborator

Summary

  • split the engine into dedicated sync-engine and sync-engine-serve binaries with shared bootstrap and shared API server startup
  • make apps/engine/src/api/index.ts side-effect-free and route Docker/dev/test callsites to the new dist/bin/serve.js and src/bin/* entrypoints
  • keep dynamic connector discovery on the full CLI path while forcing sync-engine-serve onto bundled-only connectors

Test plan

  • pnpm build
  • pnpm lint
  • pnpm exec vitest run src/api/index.test.ts src/__tests__/bin-serve.test.ts
  • node apps/engine/dist/bin/sync-engine.js --help
  • node apps/engine/dist/bin/serve.js on ports 3000 and 4000, verified with /health
  • pnpm --filter @stripe/sync-engine test (fails in this environment because src/__tests__/sync.test.ts requires Docker and the Docker daemon is unavailable)

tonyxiao and others added 30 commits March 24, 2026 22:08
The plugin loading test installs from local tarballs, not GitHub
Packages. The global STRIPE_NPM_REGISTRY env causes .npmrc to redirect
@stripe/* to GitHub Packages, breaking resolution. Override to npmjs.org
so the committed .npmrc is effectively a no-op for this test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
…uild

- e2e-plugin-loading.sh: write blank .npmrc + unset STRIPE_NPM_REGISTRY
  in temp dir so local tarballs resolve without hitting GitHub Packages
- docker-image.yml: fall back to local Docker build when ghcr.io image
  not found (workflow_dispatch may run before build.yml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
…ilure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Replace all Commander usage with citty across packages/ts-cli,
packages/protocol, apps/sync-engine, apps/stateless, and apps/stateful.

- `createCliFromSpec` now returns `CommandDef` with `meta`/`rootArgs` options
- `createConnectorCli` returns `CommandDef` with typed subcommands
- `--no-state` / `--no-connectors-from-path` → `noState` / `noConnectorsFromPath`
  boolean args (citty doesn't support Commander-style negation syntax)
- Default serve action moved to root `run()` handler (replaces `isDefault: true`)
- All tests updated: `parseAsync` → `runCommand`, Commander introspection
  APIs replaced with citty's `subCommands`/`meta`/`args` properties

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
bin entries now point at source (./src/*.ts) for dev, with
publishConfig.bin pointing at dist for npm publish. This fixes
pnpm install warnings about missing dist/ files before build.

Affected: source-stripe, destination-postgres, destination-google-sheets,
sync-engine, stateless, stateful.

Also: E2E Docker test now fails hard if ghcr.io image is missing
instead of silently building locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
publishConfig.bin is pnpm-only — npm ignores it and drops .ts bin
entries as invalid, leaving published packages with no bin at all.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
The npm job now packs each @stripe/* package from GitHub Packages
and republishes the tarballs to npmjs.org. No checkout, no install,
no build — just artifact promotion between registries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Sparse-checkout packages/ and apps/ at the target SHA, then scan
package.json files to find publishable (non-private) @stripe/* names.
No hardcoded package list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Query /orgs/{owner}/packages?package_type=npm to find all npm packages
linked to this repo. Zero checkout, zero build — pure registry-to-registry
artifact promotion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
- Replace `pnpm -r publish || true` (silently swallowed all errors) with an
  explicit script that skips already-published versions gracefully but fails
  hard on real errors (auth, network, malformed package)
- Delete outdated scripts/release-package.sh (superseded by release.yml)
- Update scripts/README.md to reference the new script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
…engine

- Delete packages/stateful-sync (@stripe/sync-lib-stateful) and
  apps/stateful (@stripe/sync-engine-stateful) — multi-tenant machinery
  not needed for single-tenancy
- Merge packages/stateless-sync (@stripe/sync-lib-stateless) into
  apps/engine/src/lib/ (engine, pipeline, ndjson, resolver, exec helpers)
- Merge apps/stateless (@stripe/sync-engine-stateless) API + CLI into
  apps/engine/src/ (api/app.ts, cli/index.ts, version.ts)
- Move apps/sync-engine to apps/engine
- Rename packages/store-postgres to packages/state-postgres
  (@stripe/sync-state-postgres)
- Add StateStore interface + file/memory implementations
  (apps/engine/src/lib/state-store.ts)
- Update all imports across tests, supabase, docs, and scripts
- Delete stateful integration test and webhook e2e test (relied on
  deleted StatefulSync class)

Before: 13 workspace packages (protocol, stateless-sync, stateful-sync,
  store-postgres, util-postgres, ts-cli, source-stripe,
  destination-postgres, destination-google-sheets, sync-engine,
  stateless, stateful, supabase)

After: 9 workspace packages (protocol, sync-engine, state-postgres,
  util-postgres, ts-cli, source-stripe, destination-postgres,
  destination-google-sheets, supabase)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
…REGISTRY

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Moves the 35-line inline script out of release.yml into a standalone
script. The workflow now sparse-checkouts just the script and runs it.
Testable locally: GITHUB_TOKEN=... NPM_TOKEN=... bash scripts/promote-to-npmjs.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Both release promote steps now delegate to scripts. Use jq for JSON
parsing in promote-to-npmjs.sh (jq is pre-installed on GitHub runners).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
…from package.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Refactor Temporal worker from 8 fine-grained activities (healthCheck,
sourceSetup, destinationSetup, backfillPage, writeBatch, processEvent,
sourceTeardown, destinationTeardown) to 3 that match the stateless API
surface: setup, sync, teardown. The sync activity calls /run which
handles the full read→write pipeline, with NDJSON streaming and
heartbeats.

Add DEFAULT_WORKFLOW_ID env var to webhook-bridge so it can signal a
known workflow directly, bypassing account-based routing (needed for
non-Connect Stripe test keys that produce events without account field).

Add tests/e2e-temporal-webhook.sh — a worker-agnostic shell script that
tests the full production topology: stripe listen → webhook bridge →
Temporal workflow → stateless API → Postgres. Verifies both backfill
and live webhook event processing end-to-end.

Fix e2e test paths from deleted apps/stateless to apps/engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
The engine calls it `state` everywhere (SyncParams.state, StateMessage).
The Temporal layer was using `cursors` for the same concept, creating a
gratuitous naming mismatch. Align to `state` throughout: SyncConfig.state,
SyncResult.state, WorkflowStatus.state, and all workflow/activity code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Change source shebangs from #!/usr/bin/env tsx → #!/usr/bin/env node.
  tsc 5.9 preserves shebangs verbatim, so dist output is correct
  without post-processing.
- Delete fix-shebangs.mjs (no longer needed)
- Delete add-js-extensions.mjs (one-time codemod, already run)
- Delete link-bins.sh (obsoleted by package.json bin entries)
- Delete scripts/README.md (stale, references moved files)
- Move d2.mjs to docs/ (diagram utility, not a build script)
- Remove fix-shebangs from all package.json build scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Restores the stateful sync layer (deleted in consolidation) as a new
@stripe/sync-service package at apps/service/. Includes:

- Credential CRUD and sync config CRUD with referential integrity
- SyncService class with push_event webhook fan-out to N syncs
- Auth retry with credential refresh (up to 2 retries)
- Shared-resource teardown logic (only removes webhook if no other syncs share the credential)
- 17 OpenAPI routes via Hono (health, credentials, syncs, sync ops, webhooks)
- File-system store implementations (credentials.json, syncs.json, state.json, logs.ndjson)
- Static Zod schemas with .passthrough() instead of dynamic buildSchemas()
- Pure exports from api/ and cli/ — only bin/ scripts have side effects
- 8 passing tests covering CRUD, referential integrity, and webhook ingress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
The engine's full read→write pipeline endpoint is a sync operation,
not a generic "run". Aligns the route name with the Temporal activity
name and the domain language.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Each store now uses a directory with one JSON file per entity
(e.g., credentials/cred_abc.json) instead of a single monolithic
JSON file. Adds 16 dedicated unit tests for all four FS store
implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Extract CLI definition into src/cli/command.ts (pure export, no side effects)
- Slim src/cli/index.ts down to a 4-line bin wrapper calling runMain()
- Replace './app' with './api', drop './source-test' and './destination-test'
- Update conformance test to import test connectors statically from '.'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
… e2e

- Move 6 demo/utility scripts from tests/ → demo/
  (read-from-stripe, write-to-postgres, write-to-sheets,
   stripe-to-postgres, stripe-to-google-sheets, reset-postgres)
- Delete tests/e2e-temporal-webhook.sh (Temporal not deployed)
- Delete tests/integration/ (empty — no test files, just config)
- Remove integration test step from e2e workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude
Zod v4 strips unknown keys by default; .catchall(z.unknown()) is the
replacement for .passthrough() to preserve connector-specific fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
tonyxiao and others added 29 commits April 13, 2026 21:59
Made-with: Cursor
Committed-By-Agent: cursor
publish_npm → publish_github_registry (Publish to GitHub Registry)
publish_npmjs → promote_to_npm_registry (Promote to npm Registry)

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor
* feat: hard time limit cutoff + client disconnect cancellation

Two-phase time limit in takeLimits: soft deadline (1s before) for
graceful return, hard deadline (1s after) via Promise.race to force-cut
blocked operations. AbortSignal threaded from HTTP handler through
engine, source connectors, and all the way to individual fetch() calls.

- EofPayload: add 'aborted' reason, cutoff ('soft'|'hard'), elapsed_ms
- takeLimits: manual iterator + Promise.race for hard deadline + signal
- ndjsonResponse: cancel() hook for Bun.serve() disconnect detection
- HTTP handlers: wireDisconnect() for Node outgoing.close + onCancel
- Stripe client: AbortSignal.any() combines pipeline + per-request timeout
- Subprocess source: child.kill() on signal abort
- Distinct log lines: SYNC_TIME_LIMIT_SOFT/HARD, SYNC_CLIENT_DISCONNECT, SYNC_ABORTED
- Unit tests for soft/hard cutoff, abort signal, elapsed_ms
- Black-box e2e test parameterized over Node/Bun/Docker runtimes
- CI: disconnect test steps for Node + Bun

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor

* fix: add hono to e2e deps, exclude disconnect test from e2e_stripe job

Made-with: Cursor
Committed-By-Agent: cursor

* fix: improve disconnect test robustness and add error logging

- Use syntactically valid postgres URL in pipeline header
- Add response body logging on non-200 for CI debugging
- Add process exit detection in waitForServer
- Increase poll interval for server health checks

Made-with: Cursor
Committed-By-Agent: cursor

* fix: add required schema field to postgres destination config in test

Made-with: Cursor
Committed-By-Agent: cursor

* fix: capture stdout (not just stderr) from engine process

Pino logger writes to stdout by default, not stderr. The test was only
capturing stderr which was empty, causing all log assertions to fail.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: relax hard time limit e2e assertion for concurrent requests

The Stripe connector launches concurrent backfill requests, so the mock
request count can be high even with a short time limit. Assert that no
new requests arrive after the hard deadline fires instead.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: skip docker disconnect test unless DISCONNECT_TEST_DOCKER is set

Docker test builds an image from scratch (~2min+) which exceeds the
beforeAll timeout in the main CI job. Only run when explicitly opted in.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: address review findings — retry leak, listener leak, remote signal, docker target

1. AbortError is no longer retryable in withHttpRetry — only TimeoutError
   is. Previously a pipeline abort would trigger up to 5 retries with
   exponential backoff, defeating the hard cutoff.

2. takeLimits abort listener is created once outside the loop instead of
   per-iteration, preventing O(messages) listener accumulation on the
   AbortSignal.

3. createRemoteEngine now forwards opts.signal to the underlying HTTP
   fetch so remote-engine callers get real cancellation.

4. Docker test helper uses --target engine (not the default service image)
   and accepts ENGINE_IMAGE env var for pre-built images.

Made-with: Cursor
Committed-By-Agent: cursor

* test: wire disconnect coverage to explicit Node/Bun/Docker runtimes

Run the disconnect suite in the intended CI jobs only, wait for log lines
instead of checking the buffer synchronously, and normalize the Docker mock
URL for container reachability. The Docker job now runs the disconnect suite
against the prebuilt engine image.

Made-with: Cursor
Committed-By-Agent: cursor

* test: run docker disconnect coverage with host networking in CI

The Docker disconnect suite now uses host networking in CI so the engine
container can reach the host mock server via localhost. Non-CI Docker runs
still use bridge mode with host.docker.internal.

Made-with: Cursor
Committed-By-Agent: cursor

* ci: wire esbuild binary path shell test into main test job

Rebased v2 added e2e/esbuild-binary-path.test.sh. Hook it back into the
main CI test job so the workflow sanity check passes and the shell test
actually runs in fresh-install CI.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: move signal out of Zod schema into type extension

AbortSignal is not serializable and doesn't belong in a Zod schema
that could be used for JSON schema generation or wire validation.
Keep it as a TypeScript-only type intersection on SourceReadOptions.

Made-with: Cursor
Committed-By-Agent: cursor

* refactor: move signal out of SourceReadOptions and Source.read params

signal is a runtime-only cancellation concern, not a serializable option.
It now lives as a separate positional argument on:
  - Engine.pipeline_read(pipeline, opts?, input?, signal?)
  - Engine.pipeline_sync(pipeline, opts?, input?, signal?)
  - Source.read(params, $stdin?, signal?)

wireDisconnect renamed to createConnectionAbort — returns the controller
instead of accepting startedAt and logging internally. The HTTP handler
owns the logging via its own onDisconnect callback.

Made-with: Cursor
Committed-By-Agent: cursor

* feat: wire signal through pipeline_write and Destination.write

pipeline_write now respects client disconnect the same way pipeline_read
and pipeline_sync do. Signal is threaded as a separate positional arg
through Engine.pipeline_write, Destination.write, and the HTTP handler.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: propagate iterator teardown through async iterable helpers

- channel: add return() and onReturn hook
- split: propagate branch return() back to the source iterator
- merge: call return() on child iterators when consumer stops
- add direct teardown tests for channel, split, and merge

Made-with: Cursor
Committed-By-Agent: cursor

* fix: restore disconnect log on Node close path and update exec arity test

- createConnectionAbort now invokes a pure logging callback before aborting
  so Node close-driven disconnects emit SYNC_CLIENT_DISCONNECT again
- update exec arity test for createSourceFromExec.read(params, stdin?, signal?)

Made-with: Cursor
Committed-By-Agent: cursor
* feat: real-time per-table sync status

Refactor state into SyncState { source, destination, engine } sections,
add trace/progress for global aggregates and enriched stream_status for
per-stream progress, enrich EOF with combined snapshot, and update the
dashboard with live per-table status.

Protocol:
- SectionState + SyncState with source/destination/engine sections
- TraceStreamStatus extended with cumulative/run/window record counts
  and throughput rates (all optional, backward compat)
- TraceProgress for global aggregates (elapsed, throughput, checkpoints)
- EofPayload with nested global_progress + stream_progress sections

Engine:
- trackProgress() pipeline stage counts records (via data-path tap),
  emits periodic enriched stream_status + trace/progress, enriches EOF
- x-source-state header renamed to x-state with backward compat
- SourceReadOptions.state now SyncState; extracts .source for connectors

Service:
- sourceState -> syncState (SyncState) in Temporal workflow
- drainMessages captures engine state from enriched EOF

Dashboard:
- Per-stream table shows status badge + rows synced column
- Global stats bar: total rows, throughput, instantaneous rate, elapsed
- NDJSON streaming for live updates during active sync

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor

* fix: address review findings -- compat shim, dashboard safety, demo state

- Add sourceState -> syncState backward compat shim in pipeline-workflow.ts
  so in-flight Temporal workflows don't lose cursor state on deploy
- Remove dashboard auto-sync (POST /pipeline_sync on page view was starting
  duplicate syncs); replace with polling for now
- Update demo scripts and workflow test stubs to pass SyncState shape

Made-with: Cursor
Committed-By-Agent: cursor

* fix: raise confidence with safe progress source and compat

- persist latest EOF progress snapshot into the service pipeline record so the
dashboard can safely render read-only progress via polling
- normalize legacy state shapes inside the engine library, not just over HTTP
- accept deprecated X-Source-State as a runtime alias while keeping x-state as
  the primary header
- add focused progress tests and update remaining direct callers/tests to the
  new SyncState expectations
- align dashboard dev config with esnext optimizeDeps so local browser
  validation works on this machine

Made-with: Cursor
Committed-By-Agent: cursor

* chore: update plans and changelog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* chore: regenerate openapi specs after v2 merge

Adds cutoff, elapsed_ms fields and aborted reason to EofPayload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* chore: format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…sages (#287)

- Move success log inside try block so it only fires on actual success
- Re-throw after yielding error trace so engine sees the failure
- Fall back to err.code when err.message is empty (Bun pg driver emits
  ECONNREFUSED in code with no message)
- Check err.code in isTransient so connection errors classify correctly
- Suppress duplicate error trace in logApiStream when destination already
  yielded one


Committed-By-Agent: claude

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: destination write errors now propagate correctly with useful messages

- Move success log inside try block so it only fires on actual success
- Re-throw after yielding error trace so engine sees the failure
- Fall back to err.code when err.message is empty (Bun pg driver emits
  ECONNREFUSED in code with no message)
- Check err.code in isTransient so connection errors classify correctly
- Suppress duplicate error trace in logApiStream when destination already
  yielded one

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* feat: include aggregated source state in EOF payload

Consumers can now read the final accumulated state directly from the EOF
message instead of tracking individual source_state messages. The new
`state` field (SectionState) aggregates per-stream and global state with
last-write-wins semantics, matching the existing persistState behavior.

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor

* fix: preserve full sync state in EOF payload

The EOF state now starts from the incoming SyncState and applies run-time source
and engine updates on top, so resumed and no-op runs return a persistable final
state instead of dropping untouched sections. Add regression coverage for state
merging, no-op resumes, and engine-only count updates.

Made-with: Cursor
Committed-By-Agent: cursor

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
#281)

* probe segmen count, make estimate, remove backfill concurrency setting

* address review: withRateLimit wrapper, smooth segments, probe-as-first-page

Change 1: Extract withRateLimit(listFn, rateLimiter) wrapper so rate-limit
boilerplate is applied once at construction time. paginateSegment,
sequentialBackfillStream, and the probe no longer take rateLimiter
as a parameter — structurally impossible to forget.

Change 2: Replace discrete 1/10/50 segment tiers with smooth
segmentCountFromDensity(timeProgress) = ceil(1/timeProgress), capped at
MAX_SEGMENTS=50. Decouple concurrency from segment count — mergeAsync
now always uses MAX_CONCURRENCY=15 regardless of segment count. Also
adds division-by-zero guard for degenerate ranges.

Change 3a: Replace probeSegmentCount with probeAndBuildSegments that
returns the first page data alongside the segments. The probe call now
uses a created filter (forward-compatible if range narrows later). The
caller yields probe records directly — zero wasted API calls. For sparse
streams the entire dataset comes from the probe with no re-fetch.

Also: export ListResult from openapi barrel, remove buildSegments default
parameter, assert probe call shape in coexist test.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: only yield probe data for single-segment streams

The probe fetches from the full time range (newest-first), so its items
may span multiple segment time ranges. Only yield probe data directly
when numSegments === 1 (sparse/single-page streams). For multi-segment
backfills, the probe is used purely for density estimation — each segment
fetches its own data independently to avoid cursor/range mismatches.

Made-with: Cursor
Committed-By-Agent: cursor

* consolidate backfill tuning constants in rate-limiter.ts

Move MAX_SEGMENTS and MAX_CONCURRENCY alongside DEFAULT_MAX_RPS so all
three throughput/concurrency knobs live in one file with comments
explaining what each controls and how they relate to each other.

Made-with: Cursor
Committed-By-Agent: cursor

---------

Co-authored-by: Tony Xiao <tonyx.ca@gmail.com>
The per-stream record counts were already available in the
stream_progress section (via run_record_count). Drop the redundant
legacy field from the protocol, pipeline, and progress tracking.

Made-with: Cursor
Committed-By-Agent: cursor
Replace the dead `incomplete` stream status with the actual failure_type
values (transient_error, system_error, config_error, auth_error) in both
TraceStreamStatus and EofStreamProgress.

Source now emits source_state with the error status when a stream fails,
so the caller can persist it. On the next run:
- transient_error → retried (same as pending)
- system_error / config_error / auth_error → skipped (permanent)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
DANGEROUSLY_VERBOSE_LOGGING captured response bodies but included them
in the info-level "request end" log. Now logged separately at debug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
The full EOF payload (global_progress + stream_progress) is now logged
at info level when the sync completes. Response bodies remain at debug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Shows stream counts (complete/in-progress/errored/pending), per-stream
row counts for active streams, and error details for failed streams.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Info level now only has: request start, request end, EOF summary + payload.
Stream-level started/completed logs moved to debug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Initialize streamStatus from persisted engine state on resume so
  streams completed in prior runs show as complete, not running
- Save stream status alongside cumulative_record_count in engine state
- Never default to 'running' — streams without a known status are
  excluded from stream_progress and stream_status traces
- source_state without prior stream_status infers 'started' not 'running'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
- Engine derives stream status from trace/error failure_type (only
  overrides non-complete streams) — no extra source messages needed
- Seed streamStatus from source state on resume so streams the source
  skips (already complete/errored) show correct status in progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Add an alternative to passing config via x-pipeline/x-state headers:
callers can now send Content-Type: application/json with a body like
{pipeline, state?, body?} containing native JSON objects (no string
escaping). The content-type header determines which mode is used —
application/json for the new path, anything else for the existing
header + NDJSON behavior. Purely additive, fully backward compatible.

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor
* refactor: move stream cancellation to iterator return

Make AsyncIterator.return() the public cancellation contract across the engine, protocol helpers, and Stripe source so early teardown interrupts in-flight waits without threading a public AbortSignal through stream APIs.

Constraint: Keep HTTP disconnect handling bridged through ndjson response teardown
Rejected: Continue exposing public stream AbortSignal parameters | it duplicates the iterator cancellation surface
Confidence: high
Scope-risk: broad
Made-with: Cursor
Committed-By-Agent: cursor

* cleanup: drop onCancel from ndjsonResponse and remove DOMException

onCancel was redundant — ReadableStream.cancel() already calls
iterator.return() for Bun, and signal handles Node.js disconnects.
Unify on a single mechanism.

Replace the three duplicated getAbortError/DOMException helpers with
plain signal.reason (now always a plain Error set by withAbortOnReturn).

Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor
Made-with: Cursor
Committed-By-Agent: cursor

* fix: suppress unhandled rejections from fire-and-forget iterator.return()

Add .catch(() => {}) to void iterator.return() calls in ndjsonResponse
abort listener and pipeline closeIteratorInBackground, preventing
unhandled rejections if the upstream iterator rejects during teardown.

Made-with: Cursor
Committed-By-Agent: cursor
* feat: restore JSON body config mode without breaking CLI paths

Reintroduce application/json config bodies for the engine API while keeping
legacy header and NDJSON flows working. Fix OpenAPIHono and the OpenAPI-based
CLI generation so typed header schemas, streaming endpoints, and connector
loading all continue to work with the new spec.

Made-with: Cursor
Committed-By-Agent: cursor

* refactor: centralize mode-specific pipeline access

Move JSON-vs-header requiredness into shared helpers so handlers stop
merging two conditional code paths into nullable locals.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: tighten JSON content-type validation

Keep JSON validation strict for JSON routes while avoiding false positives for NDJSON and JSON-header alternatives.

Made-with: Cursor
Committed-By-Agent: cursor

* fix: align isJsonBody with middleware content-type logic

Export isApplicationJsonContentType from hono-zod-openapi and reuse it
in app.ts so handler and middleware agree on what counts as JSON body
mode. Fixes mixed-case and json-seq false positives/negatives.

Made-with: Cursor
Committed-By-Agent: cursor
…provements (#290)

* fix: improve retry logging, fault-tolerant getAccount, curl trace format

Add labels to retry log messages for easier debugging, include error
messages in retry output, make getAccount gracefully fall back on failure,
and format HTTP trace logs as reproducible curl commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude

* Adding temp backfill command

* scripts: add mitmweb intercept env setup and test

Sources mitmweb-env.sh to route Node/Bun/curl traffic through mitmweb
on localhost:8080, auto-starting it with the internal upstream proxy if
needed. mitmweb-env.test.sh verifies all three runtimes in parallel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* source-stripe: assert --use-env-proxy when proxy env vars are set

Node's built-in fetch (undici) silently bypasses HTTP_PROXY/HTTPS_PROXY
without the --use-env-proxy flag. assertUseEnvProxy() throws at startup
if a proxy is configured but the flag is absent, preventing silent
proxy bypass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Revert "source-stripe: assert --use-env-proxy when proxy env vars are set"

This reverts commit eb4e061.

* scripts: standalone assert-use-env-proxy with tests

Fails fast if HTTPS_PROXY/HTTP_PROXY is set without --use-env-proxy,
preventing silent proxy bypass in Node's built-in fetch (undici).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* scripts: convert assert-use-env-proxy to TypeScript with bun:test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* ts-cli: add assertUseEnvProxy with unit and subprocess tests

Throws when HTTPS_PROXY/HTTP_PROXY is set but --use-env-proxy is absent
(Node's built-in fetch silently bypasses the proxy without it). Bun is
exempt since it always respects proxy env natively.

Subprocess tests spawn real node/bun processes to verify the flag check
works end-to-end, not just in unit logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* ts-cli: rename proxy -> env-proxy, move test next to source

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* ts-cli: move ndjson and config tests next to source files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* scripts: improve mitmweb-env.sh; add mitmweb test to CI

- Auto-install mitmproxy via pip if not found
- Auto-detect upstream proxy from http_proxy/https_proxy env vars
  instead of hardcoding the Stripe egress proxy URL; falls back to
  direct mode in CI and clean environments
- Add mitmweb proxy intercept test step to CI test job (httpbin.org)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* ci: move mitmweb test to its own parallel job

Avoids adding latency to the main test job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* ts-cli: skip bun subprocess test gracefully when bun is not installed

Fixes CI failure in the test job where bun isn't available yet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* scripts: add ws WebSocket test to mitmweb intercept suite

Tests that ws connections route through mitmweb via explicit
HttpsProxyAgent (ws ignores HTTP_PROXY and --use-env-proxy).
CI mitmweb job now installs pnpm deps so ws/https-proxy-agent resolve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* source-stripe: replace fetchWithProxy with native fetch + tracedFetch

Node 24 --use-env-proxy makes undici/fetch respect HTTPS_PROXY natively,
so the manual ProxyAgent dispatcher injection in fetchWithProxy is
redundant. Replace all call sites with tracedFetch (curl trace logging
only, no proxy logic) and remove withFetchProxy/fetchWithProxy entirely.

getHttpsProxyAgentForTarget is kept — ws does not respect --use-env-proxy
and still needs an explicit HttpsProxyAgent for WebSocket connections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* fix: restore getAccount retry and update getAccountCreatedTimestamp test

- Restore getAccount to use requestWithRetry (e0fb332 unintentionally
  removed retry by switching to request directly)
- Update test that expected getAccount failure to halt backfill: since
  getAccountCreatedTimestamp now swallows errors (fault-tolerant), backfill
  proceeds with STRIPE_LAUNCH_TIMESTAMP fallback instead of failing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* fix: add --use-env-proxy to Docker entrypoints and assertUseEnvProxy at startup

- Add --use-env-proxy flag to both engine and service Docker ENTRYPOINT so
  fetch/undici respects HTTPS_PROXY env var in container deployments
- Call assertUseEnvProxy() at startup in both CLI entry points so proxy
  misconfiguration (proxy set without --use-env-proxy) is caught immediately
  with a clear error rather than silently bypassing the proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* avoid duplicates in sheets

* fix
Brings the v2 monorepo rewrite into main, completely replacing the
deprecated single-package repo. The resulting tree is identical to v2.

Committed-By-Agent: claude
* add more context to sync-engine responses

* fix: extract only debug headers and sanitize non-JSON previews

Avoid materializing all response headers via Object.fromEntries — use
pickDebugHeaders to pluck only the diagnostically relevant keys directly
from the Headers object. Escape newlines in non-JSON response previews
so HTML error pages don't blow up log lines. Update test assertion to
match the new enriched error message format.

Made-with: Cursor
Committed-By-Agent: cursor

---------

Co-authored-by: Tony Xiao <tonyx.ca@gmail.com>
* fix: raise sync-engine serve header limit

Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>

* ci: run docker header size regression

Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>

* test: avoid blocking vitest worker in docker header test

Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>

* refactor: type header size server options

Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>

* ci: avoid duplicate header size test runs

Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>

---------

Co-authored-by: codex <noreply@openai.com>
Move the server startup path into dedicated bin and API server modules so Docker and local dev can start the bundled-only HTTP server without paying the citty/OpenAPI CLI bootstrap cost. Keep the interactive CLI on its own binary while making the engine API export surface side-effect-free.

Made-with: Cursor
Committed-By-Agent: cursor
@tonyxiao tonyxiao closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants