feat(sinks): introduce RetryStrategy / Config for http based sinks#25057
feat(sinks): introduce RetryStrategy / Config for http based sinks#25057ndrsg wants to merge 6 commits intovectordotdev:masterfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
@codex review - draw context from #10870 (comment) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27feabc878
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn should_retry_response(&self, response: &Self::Response) -> RetryAction<Self::Request> { | ||
| let status = response.status(); | ||
|
|
||
| match status { | ||
| StatusCode::TOO_MANY_REQUESTS => RetryAction::Retry("too many requests".into()), | ||
| StatusCode::REQUEST_TIMEOUT => RetryAction::Retry("request timeout".into()), | ||
| StatusCode::NOT_IMPLEMENTED => { | ||
| RetryAction::DontRetry("endpoint not implemented".into()) | ||
| } | ||
| _ if status.is_server_error() => RetryAction::Retry( | ||
| format!("{}: {}", status, String::from_utf8_lossy(response.body())).into(), | ||
| ), | ||
| _ if status.is_success() => RetryAction::Successful, | ||
| _ => RetryAction::DontRetry(format!("response status: {status}").into()), | ||
| } | ||
| self.retry_strategy.retry_action(response.status()) |
There was a problem hiding this comment.
Honor
retry_strategy = none for transport failures
The new RetryStrategy::None mode is introduced as "Don't retry any errors," but this implementation only applies the strategy to HTTP response statuses; request/transport errors are still retried because is_retriable_error remains unconditional elsewhere in the retry logic. In practice, sinks configured with retry_strategy.type = "none" will still back off/retry on connection, DNS, TLS, or timeout errors, so the new mode cannot actually disable retries end-to-end.
Useful? React with 👍 / 👎.
| if status.is_server_error() { | ||
| RetryAction::Retry(reason) |
There was a problem hiding this comment.
Keep 501 out of the default retriable status set
The default branch now retries any 5xx via status.is_server_error(), which newly makes 501 Not Implemented retriable. Previously this code path treated 501 as non-retriable, and 501 is usually a permanent incompatibility (unsupported method/endpoint) rather than a transient outage. Retrying it consumes the full retry budget and delays rejection on misconfiguration; this should stay excluded unless users explicitly opt into all or custom.
Useful? React with 👍 / 👎.
Summary
This PR introduces a configurable
retry_strategyfor Vector's shared HTTP retry logic and exposes it on thehttpsink plus the sinks built on the same HTTP retry helpers.It adds support for
default,none,all, andcustomretry modes, wires that behavior through the affected HTTP-based sinks, and adds documentation plus test coverage for custom status-code retries. The default strategy now consistently retries408,429, and5xxresponses across the shared HTTP retry implementations.Affected sinks
Configurable sinks using
HttpStatusRetryLogic:httpaxiomopentelemetryappsignalazure_logs_ingestionazure_monitor_logsdatadog_eventsgcp_stackdriver_logsgcp_stackdriver_metricshoneycombkeepprometheus_remote_writeSinks that still use
HttpRetryLogicand are not made configurable by this PR:clickhousegreptimedblogsinfluxdb_metricssematext_metricssplunk_heclogssplunk_hecmetricsVector configuration
I used the new HTTP sink example configuration to validate the custom retry behavior (also checked in in examples folder):
How did you test this PR?
RetryStrategybehavior insrc/sinks/util/http.rs.src/sinks/http/tests.rs.config/examples/http_sink_custom_retry.yamlagainst it.make check-clippylocally../scripts/check_changelog_fragments.sh.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References