Skip to content

Improve HTTP response handling and retry behavior#1342

Open
MichaelGHSeg wants to merge 31 commits intomasterfrom
response-status-updates
Open

Improve HTTP response handling and retry behavior#1342
MichaelGHSeg wants to merge 31 commits intomasterfrom
response-status-updates

Conversation

@MichaelGHSeg
Copy link
Collaborator

@MichaelGHSeg MichaelGHSeg commented Feb 12, 2026

Summary

This PR improves HTTP response handling and retry behavior across browser and node SDKs to align with analytics-java implementation.

Key Changes

HTTP Headers:

  • Always send X-Retry-Count header starting with '0' on first attempt
  • Add Authorization headers with Basic auth (browser) and OAuth fallback (node)

Retry Behavior:

  • Cap Retry-After header values at 300 seconds maximum
  • Remove 413 (Payload Too Large) from retryable status codes - payload won't shrink on retry
  • Standardize backoff timing: 100ms minimum, 60 second maximum
  • Increase default maxRetries to 1000 (was 3 for node, 10 for browser)

Commits

  1. Always send X-Retry-Count and Authorization headers
  2. Add Retry-After cap and fix 413 handling
  3. Fix node publisher to always send X-Retry-Count header
  4. Standardize backoff timing: 100ms min, 60s max
  5. Increase default maxRetries to 1000

Test Coverage

  • Added tests for Authorization header in all dispatchers
  • Added tests for Retry-After capping behavior
  • Added tests for 413 non-retryable behavior
  • Updated all existing tests to reflect new X-Retry-Count behavior
  • Updated backoff tests for new timing expectations

All tests passing:

  • Browser: 28 tests (batched-dispatcher), 11 tests (fetch-dispatcher)
  • Node: 42 tests (publisher)
  • Core: 3 tests (backoff)

🤖 Generated with Claude Code

MichaelGHSeg and others added 12 commits January 16, 2026 11:49
- Send X-Retry-Count header on all requests (0 for first attempt)
- Add Basic auth using write key for browser dispatchers
- Node publisher prefers OAuth Bearer, falls back to Basic auth

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap Retry-After header at 300 seconds (MAX_RETRY_AFTER_SECONDS)
- Remove 413 from retryable status codes (payload won't shrink on retry)
- Fix fetch-dispatcher to base64 encode Authorization header
- Add test coverage for Authorization header in all dispatchers
- Add tests for Retry-After capping behavior
- Update test expectations for X-Retry-Count always being sent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove conditional X-Retry-Count header logic in publisher.ts
- Always send X-Retry-Count starting with '0' on first attempt
- Update all test expectations from toBeUndefined() to toBe('0')
- Update T01 test name to reflect header is now sent

This aligns node behavior with browser dispatchers which already
always send the header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update core and browser backoff defaults: minTimeout 100ms (was 500ms), maxTimeout 60s (was Infinity)
- Update node publisher explicit backoff params to match
- Update backoff tests to reflect new timing expectations
- Aligns with analytics-java backoff timing (100ms base, 1min cap)

This provides faster initial retries while preventing excessive delays,
improving both responsiveness and resource efficiency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update node default maxRetries from 3 to 1000
- Update browser default maxRetries from 10 to 1000
- Aligns with analytics-java which uses 1000 max flush attempts

With the shorter 100ms base backoff (60s max), higher retry limits
allow better recovery while still respecting the backoff timing caps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@changeset-bot
Copy link

changeset-bot bot commented Feb 12, 2026

⚠️ No Changeset found

Latest commit: 9f094b3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 95.79288% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.57%. Comparing base (3f1f585) to head (9f094b3).

Files with missing lines Patch % Lines
packages/node/src/plugins/segmentio/publisher.ts 92.30% 9 Missing ⚠️
...rowser/src/plugins/segmentio/batched-dispatcher.ts 95.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1342      +/-   ##
==========================================
+ Coverage   91.23%   91.57%   +0.33%     
==========================================
  Files         163      163              
  Lines        4393     4617     +224     
  Branches     1055     1128      +73     
==========================================
+ Hits         4008     4228     +220     
- Misses        385      389       +4     
Flag Coverage Δ
browser 92.49% <97.67%> (+0.27%) ⬆️
core 90.12% <100.00%> (-0.05%) ⬇️
node 89.47% <93.38%> (+1.54%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Updated header expectations in http-integration.test.ts to include Authorization
- Updated header expectations in http-client.integration.test.ts to include X-Retry-Count and Authorization
- Added maxRetries: 3 to OAuth integration tests that expect exactly 3 retry attempts
- Added maxRetries: 0 to timeout test to prevent excessive retries during timeout testing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the browser and node SDK HTTP dispatch/retry logic to align response handling (headers, retryability, backoff timing) with the analytics-java implementation.

Changes:

  • Standardizes request headers (adds Authorization and X-Retry-Count) and updates tests accordingly.
  • Revises retry policy (Retry-After handling + cap, retryable status allow/deny lists, standardized backoff bounds).
  • Increases default retry budget (notably node default maxRetries to 1000) and adjusts token refresh retry behavior.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
packages/node/src/plugins/segmentio/publisher.ts New header handling + revised retry semantics including Retry-After support and new success/retry classification.
packages/node/src/plugins/segmentio/tests/publisher.test.ts Expands coverage for new retry semantics, headers, and status handling.
packages/node/src/lib/token-manager.ts Adjusts OAuth retry pacing, switches rate-limit handling to Retry-After, and fixes token validity logic.
packages/node/src/lib/tests/token-manager.test.ts Adds/updates tests for token validity, Retry-After spacing, and refresh scheduling.
packages/node/src/app/analytics-node.ts Raises default maxRetries to 1000.
packages/node/src/tests/test-helpers/assert-shape/segment-http-api.ts Makes header assertions resilient to new required headers.
packages/node/src/tests/oauth.integration.test.ts Updates OAuth integration expectations for Retry-After and retry limits.
packages/node/src/tests/http-integration.test.ts Updates integration snapshots/expectations for auth + retry headers.
packages/node/src/tests/http-client.integration.test.ts Updates expected outbound headers for node HTTP client integration.
packages/node/src/tests/emitter.integration.test.ts Updates emitter integration setup for new retry behavior expectations.
packages/core/src/priority-queue/backoff.ts Changes default backoff bounds (min 100ms, max 60s).
packages/core/src/priority-queue/tests/backoff.test.ts Updates core backoff tests to match new bounds.
packages/browser/src/plugins/segmentio/ratelimit-error.ts Adds flag to mark Retry-After retries as “doesn’t consume retry budget”.
packages/browser/src/plugins/segmentio/index.ts Passes attempt count through to dispatcher and drops non-retryable failures.
packages/browser/src/plugins/segmentio/fetch-dispatcher.ts Adds Basic auth header, supports Retry-After, and refines retryable status handling.
packages/browser/src/plugins/segmentio/batched-dispatcher.ts Adds Basic auth + X-Retry-Count, introduces Retry-After handling, and changes batching behavior.
packages/browser/src/plugins/segmentio/tests/retries.test.ts Updates retry semantics tests and adds header expectations.
packages/browser/src/plugins/segmentio/tests/fetch-dispatcher.test.ts New unit tests for standard dispatcher status/Retry-After/header behavior.
packages/browser/src/plugins/segmentio/tests/batched-dispatcher.test.ts Updates batching snapshots and adds retry/header semantics coverage.
packages/browser/src/lib/priority-queue/backoff.ts Mirrors core backoff default changes for browser PQ backoff.
packages/browser/src/lib/priority-queue/tests/backoff.test.ts Updates browser backoff tests to match new bounds.
packages/browser/src/core/mocks/analytics-page-tools.ts Adds a mock module file (currently appears not wired up).
packages/browser/jest.config.js Adds moduleNameMapper for @segment/analytics-page-tools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MichaelGHSeg and others added 4 commits February 13, 2026 11:57
When buildBatch() splits events due to payload size limits, ensure remaining
events are flushed:
- On success: schedule flush for remaining buffered events
- On retry exhaustion: drop failed batch but continue flushing remaining events

This prevents events from being orphaned indefinitely when batches are split.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace shared totalAttempts counter with per-flush requestCount + isRetrying
  flag to fix race condition when pagehide sends concurrent chunks via
  Promise.all (each chunk now correctly gets X-Retry-Count: 0)
- sendBatch takes retryCount parameter instead of using shared mutable state
- Guard Authorization header behind if(writeKey) to avoid unnecessary encoding
- Schedule flush for remaining events on both success and retry exhaustion paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MAX_RETRY_AFTER_RETRIES (20) safety cap to prevent infinite
  retries when server keeps returning Retry-After headers (node
  publisher and browser batched-dispatcher)
- Lower node maxRetries default from 1000 to 10
- Fix convertHeaders TS warning by removing redundant any cast
- Fix retryCount metadata stamped on wrong events (batch.forEach
  instead of buffer.map)
- Update tests: X-Retry-Count always sent (0 on first attempt),
  413 now non-retryable, 300 treated as success, Retry-After cap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update priority-queue backoff thresholds for minTimeout 100ms
  (was 500ms): delay assertions lowered from >1000/2000/3000 to
  >200/400/800
- Update integration test to use objectContaining for headers
  since Authorization and X-Retry-Count are now always sent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bsneed
bsneed previously approved these changes Feb 18, 2026
…ap test

- Clamp parsed Retry-After to >= 0 in batched-dispatcher and fetch-dispatcher
- Clamp clockSkew-adjusted wait time to >= 0 in token-manager
- Add T21 test for MAX_RETRY_AFTER_RETRIES safety cap in publisher
- Rename T01 test to reflect that header is '0', not absent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MichaelGHSeg and others added 2 commits February 20, 2026 12:01
The mock was never applied because moduleNameMapper in jest.config.js
points to the real implementation, taking precedence over __mocks__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…patchers

Extract parseRetryAfter and getStatusBehavior helpers into
shared-dispatcher, replacing duplicated hardcoded status code logic in
both fetch-dispatcher and batched-dispatcher. Status behavior is now
driven by httpConfig (default4xxBehavior, default5xxBehavior,
statusCodeOverrides) from CDN settings, with sensible built-in defaults.

Also wires httpConfig through settings.ts and the segmentio plugin index
so CDN-provided HTTP configuration reaches the dispatchers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MichaelGHSeg and others added 2 commits February 23, 2026 15:08
Rename rateLimitConfig.maxTotalBackoffDuration to maxRateLimitDuration
(default 180s). Add computeBackoff() helper for exponential backoff with
jitter. Wire backoff timing, cumulative backoff duration cap, and
cumulative rate-limit duration cap into batched-dispatcher retry logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Node (packages/node):
- Remove 408/503 from Retry-After eligibility (only 429 uses Retry-After)
- Add rate-limit state to Publisher (rateLimitedUntil, rateLimitStartTime)
- 429 with Retry-After: set rate-limit state, requeue batch, halt flush
- 429 without Retry-After: counted backoff
- Add maxTotalBackoffDuration / maxRateLimitDuration config (default 43200s)
- Success clears rate-limit state
- Add tests: T04, T19, T20

Browser (packages/browser):
- Remove 408/503 from RETRY_AFTER_STATUSES (only 429)
- Fix maxRateLimitDuration default from 180s to 43200s per SDD
- Fix 429 pipeline blocking: isRetrying flag prevents new dispatches from
  bypassing scheduled retry
- Add tests: T04, T19, T20; update 408/503 tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MichaelGHSeg and others added 9 commits February 25, 2026 14:01
Refactor Node publisher to use sleep-and-retry for 429 with Retry-After
instead of resolve-and-return, aligning retry behavior with the Java SDK.
Fix maxRetries default doc from 3 to 10 in settings. Add JSDoc
clarifications for browser dispatcher rate-limit config fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add comment explaining 100-399 success range per SDD spec
- Rename misleading test names to match actual assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- fetch-dispatcher: expect X-Retry-Count to be '0' on first call (the
  dispatcher always sends the header, defaulting via ?? 0)
- publisher: 511 without a token manager is non-retryable (like 501/505),
  so only 1 attempt should be made, not M+1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add analytics.on('error') handler to capture delivery failures.
Reports success=false with the first error reason when any batch
fails (non-retryable error or retries exhausted).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same change as the node e2e-cli: add analytics.on('error') handler
to capture delivery failures and report success=false.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nitoring

The browser SDK's Segment.io plugin handles retries internally via
closure-scoped buffers and swallows all errors (never fires delivery_failure).
This made the previous approach of using analytics.on('error') and a fixed
3-second delay unreliable for retry testing.

Changes:
- Install a fetch monitor before SDK import to track API request activity
- Replace fixed delay(3000) with waitForDelivery() that settles based on
  actual HTTP activity (3s after success, 6.5s after error to account for
  the SDK's Math.random() * 5000 flush scheduling)
- Use fetch response statuses for error detection instead of analytics
  error events

Results: 37/41 retry tests pass. 4 remaining failures are due to browser
SDK architectural limitations (maxAttempts hardcoded to 10, random flush
scheduling dominating backoff timing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants