Conversation
WalkthroughAdds support for reusable upstream definitions with per-upstream timeout configuration; updates all timeout configuration keys from seconds to milliseconds; introduces validation for upstream definitions and references; extends XDS translator to resolve upstream definitions and apply timeouts; adds integration test coverage for backend timeout behavior. Changes
Sequence DiagramsequenceDiagram
participant Config as Configuration
participant Validator as APIValidator
participant Translator as XDS Translator
participant RouteCluster as Route/Cluster
Config->>Validator: Validate API with UpstreamDefinitions
Validator->>Validator: validateUpstreamDefinitions()
Validator->>Validator: Check names, duplicates, URLs, weights, timeout formats
alt Valid definitions
Validator-->>Config: ✓ Validation passed
else Invalid
Validator-->>Config: ✗ Validation errors
end
Config->>Translator: Translate API config with UpstreamDefinitions
Translator->>Translator: resolveUpstreamCluster(upstream ref, definitions)
alt Upstream has reference
Translator->>Translator: resolveUpstreamDefinition(ref name)
Translator->>Translator: Extract URL and Timeout from definition
Translator->>Translator: resolveTimeoutFromDefinition()
Translator->>Translator: parseTimeout() - parse duration strings
Translator-->>Translator: Return (clusterName, URL, resolvedTimeout, error)
else Direct URL
Translator-->>Translator: Return (clusterName, URL, nil timeout, nil error)
end
Translator->>RouteCluster: createCluster(URL, connectTimeout)
Translator->>RouteCluster: createRoute(context, method, path, timeoutCfg)
RouteCluster->>RouteCluster: Apply per-route timeouts from config or defaults
RouteCluster-->>Translator: ✓ Cluster and Route configured
Translator-->>Config: ✓ XDS translation complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 257-272: The validator currently can mark empty names as
duplicates because namesSeen[def.Name] is being set even when def.Name is empty;
update the loop in the upstream definition validation so you only record
namesSeen for non-empty names (e.g., move or guard the namesSeen[def.Name] =
true assignment behind a check that def.Name != ""), keeping the existing
ValidationError append for empty names (uses ValidationError, def.Name,
namesSeen) and preserving the duplicate-name check logic.
In `@gateway/gateway-controller/pkg/xds/translator.go`:
- Around line 1351-1369: The resolvedTimeout.Connect value is parsed but never
applied; update createCluster to accept a connect timeout parameter and pass
resolvedTimeout.Connect from where createCluster is invoked (or, if you prefer
the other option, remove the Connect field from resolvedTimeout and all related
parsing). Specifically, change the createCluster signature to include a
connectTimeout (or an options struct) and use it instead of the hardcoded 5 *
time.Second inside createCluster, and update all call sites that construct
clusters to forward the resolvedTimeout.Connect (the code that currently
computes requestTimeout/idleTimeout and the resolvedTimeout struct). Ensure
resolvedTimeout.Connect is respected consistently or removed along with its
parsing logic.
🧹 Nitpick comments (4)
gateway/gateway-controller/pkg/config/validator_test.go (1)
700-721: Consider adding a test for missing URL scheme.The current
TestValidateUpstreamDefinitions_InvalidURLtest uses"not-a-valid-url"which doesn't have a scheme. This validates the scheme check, but consider adding a separate test case for a URL with missing host (e.g.,"http://") to ensure full coverage of the URL validation logic.gateway/it/features/backend_timeout.feature (1)
64-96: Consider verifying global timeout value for more meaningful test.The "global defaults" scenario waits the same 2 seconds as the explicit timeout scenario. If the global default timeout is also around 2 seconds or less, this test passes but doesn't truly validate the "defaults" behavior.
Consider either:
- Documenting what the expected global default timeout value is
- Adjusting wait time based on known global defaults
- Adding a comment explaining that this test assumes global defaults will also cause a timeout within 2 seconds for an unreachable backend
gateway/gateway-controller/pkg/xds/translator.go (2)
556-559: Only the first URL from the first upstream target is used.The current implementation extracts only
definition.Upstreams[0].Urls[0]. If the upstream definition has multiple targets or URLs for weighted load balancing, they are ignored. This appears intentional for initial implementation, but consider documenting this limitation or adding a TODO for future enhancement.
2477-2490: Consider validating that parsed timeouts are positive.
time.ParseDurationaccepts zero and negative values (e.g.,"0s","-5s"). A zero timeout effectively means "expire immediately" and negative values are likely configuration errors. Consider adding validation:🛡️ Proposed validation
func parseTimeout(timeoutStr *string) (*time.Duration, error) { if timeoutStr == nil || strings.TrimSpace(*timeoutStr) == "" { return nil, nil } duration, err := time.ParseDuration(strings.TrimSpace(*timeoutStr)) if err != nil { return nil, fmt.Errorf("invalid timeout format: %w", err) } + if duration <= 0 { + return nil, fmt.Errorf("timeout must be positive, got: %v", duration) + } + return &duration, nil }
4bb1852 to
6570331
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@gateway/gateway-controller/api/openapi.yaml`:
- Around line 1857-1895: The OpenAPI schema declares
UpstreamDefinition.upstreams[].urls as an array but the translator currently
reads only definition.Upstreams[0].Urls[0] (see TODO "Support multiple upstream
targets"); update the implementation in
gateway/gateway-controller/pkg/xds/translator.go to iterate all
UpstreamDefinition.Upstreams and each Upstream.Urls, creating backend endpoints
(respecting per-upstream weight and default weight) and aggregating them into
the cluster/load-balancer config instead of using only the first URL, then
remove the TODO; alternatively, if you prefer to keep single-target behavior,
update the OpenAPI UpstreamDefinition.upstreams.items.properties.urls
description to state only the first URL is used and restrict the schema to a
single string field to match current behavior.
- Around line 1897-1915: The OpenAPI schema for UpstreamTimeout currently
restricts duration units in the pattern on properties connect, request, and
idle; update the regex used for those properties' pattern to allow Go's
supported units by including ns and us (e.g., change the pattern from
^\d+(\.\d+)?(ms|s|m|h)$ to a pattern that includes ns and us such as
^\d+(\.\d+)?(ns|us|ms|s|m|h)$) so the OpenAPI validation matches
time.ParseDuration() behavior for the UpstreamTimeout object and its fields
(connect, request, idle).
In `@gateway/it/steps_backend_timeout.go`:
- Around line 34-37: The step that asserts elapsed time hardcodes the key
"request_start" while the recorder step (ctx.Step `I record the current time as
"([^"]*)"$`) accepts any key; update the assertion step(s) to accept a key
parameter instead of the hardcoded "request_start" and use that key with
state.GetContextValue/GetContextTimestamp (or the existing state helper) and
time.Since to compute elapsed time; locate the assertion ctx.Step(s) that
currently reference "request_start" (and the similar one noted for lines 44-46)
and change their regex to capture a key string, replace literal "request_start"
references with the captured key variable, and keep the same assertion logic.
In `@gateway/it/test-config.toml`:
- Around line 120-124: The comment says "6s route timeout" but
route_timeout_in_milliseconds = 5000 (5s) in the
gateway_controller.router.envoy_upstream.timeouts block; update either the
comment to "5s route timeout" to match route_timeout_in_milliseconds (and
max_route_timeout_in_milliseconds) or change route_timeout_in_milliseconds and
max_route_timeout_in_milliseconds to 6000 to match the "6s" comment (ensure
consistency with gateway/it/test-config.yaml which uses 6000ms).
6570331 to
5824bab
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 207-242: The validator currently treats a present-but-blank ref as
valid; update validateUpstreamRef so that only a nil ref is ignored but a ref
that trims to an empty string produces a ValidationError. Specifically, in
APIValidator.validateUpstreamRef check ref == nil and return no errors, but if
strings.TrimSpace(*ref) == "" append a ValidationError for field
"spec.upstream."+label+".ref" indicating the reference must not be empty; keep
the subsequent logic that checks upstreamDefinitions and looks up def.Name ==
refName when refName is non-empty.
In `@gateway/gateway-controller/pkg/xds/translator.go`:
- Around line 2503-2520: The parseTimeout function currently rejects zero
durations (it checks duration <= 0); change that check in parseTimeout to only
reject negative durations (duration < 0) so explicit "0s" is accepted by route
request/idle/connect timeouts, and add a unit test for parseTimeout covering
"0s" (and a negative value) to document behavior; alternatively, if you intend
to keep rejecting zero, update parseTimeout's doc comment to explain why zero is
disallowed and add a test asserting that "0s" errors. Mention parseTimeout by
name in both the code change and test update.
🧹 Nitpick comments (2)
gateway/gateway-controller/pkg/config/config.go (1)
205-226: Good addition of cluster-level upstream defaults.As per coding guidelines: When modifying code in gateway components (gateway-controller, policy-engine, router, etc., excluding gateway/it/), rebuild Docker images using
cd gateway && make build-local.gateway/gateway-controller/pkg/xds/translator_test.go (1)
134-141: Consider adding a test for zero duration.The implementation rejects
duration <= 0, but there's no test explicitly verifying this behavior for"0s". Adding such a test would document the expected behavior and prevent regressions.📝 Suggested test addition
func TestParseTimeout_ZeroDuration(t *testing.T) { zero := "0s" duration, err := parseTimeout(&zero) assert.Error(t, err) assert.Nil(t, duration) assert.Contains(t, err.Error(), "timeout must be positive") }
5824bab to
6dacf56
Compare
6dacf56 to
a97801b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/gateway-builder/go.mod (1)
11-16:⚠️ Potential issue | 🟡 MinorMove the testify dependency to the first require block.
The
testifydependency (line 12) is imported directly in multiple test files throughout this module (assertandrequirepackages), making it a direct dependency. According to Go conventions, it should be moved to the firstrequireblock (lines 5-9) alongside other direct dependencies, rather than placed in the second block which is for indirect dependencies.The second block should contain only indirect dependencies marked with
// indirect.
🤖 Fix all issues with AI agents
In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 334-373: The timeout validator currently calls time.ParseDuration
on def.Timeout.Request (and similarly def.Timeout.Connect and def.Timeout.Idle)
which accepts disallowed units like ns/us; instead pre-validate the trimmed
timeout string against the OpenAPI regex that only allows units ms|s|m|h (e.g. a
compiled regexp matching ^\d+(ms|s|m|h)$ or your schema pattern) and if it
doesn't match append a ValidationError for the corresponding field
(spec.upstreamDefinitions[%d].timeout.request / .connect / .idle); only call
time.ParseDuration for strings that pass the regex and keep the existing
ParseDuration error handling as a fallback.
In `@gateway/it/features/backend_timeout.feature`:
- Around line 65-96: Update the inline comment that currently states the
global-default is 5s to reflect the actual default of 6s used in the test (the
scenario "RestApi without upstream timeout uses global defaults" asserts 6s and
the config uses 6000ms); find the comment above that scenario and change "5s" to
"6s" (or reword to "6s default") so the comment matches the assertion and
config.
🧹 Nitpick comments (5)
gateway/gateway-controller/pkg/xds/translator_test.go (1)
158-341: Prefer constructingTranslatorvia helpers instead of&Translator{}.Direct instantiation can become brittle if
resolveUpstreamClusterstarts reading config/logger. UsingcreateTestTranslator()(orNewTranslator) keeps tests future‑proof.♻️ Suggested tweak (apply consistently in this block)
- translator := &Translator{} + translator := createTestTranslator()gateway/gateway-controller/pkg/config/api_validator.go (1)
418-425: Remember to rebuild gateway images after gateway-controller changes.This change touches gateway-controller validation paths, so ensure images are rebuilt.
As per coding guidelines, when modifying code in gateway components (gateway-controller, policy-engine, router, etc., excluding gateway/it/), rebuild Docker images using
cd gateway && make build-local.gateway/gateway-controller/pkg/config/validator_test.go (1)
771-796: Consider adding a disallowed-unit timeout case (ns/us).Since the schema restricts units to
ms|s|m|h, it would help to assert that10us/10nsare rejected.Based on learnings, the UpstreamTimeout schema intentionally restricts duration units to ms, s, m, h (excluding ns and us).
gateway/gateway-controller/pkg/config/config.go (1)
205-215: Reminder: rebuild gateway Docker images after gateway component changes.
Runcd gateway && make build-localbefore validation/release.As per coding guidelines:
gateway/**/*.{go,yaml,yml,Dockerfile}→ rebuild Docker images usingcd gateway && make build-local.gateway/gateway-controller/pkg/xds/translator.go (1)
85-91: Minor typo: escaped quotes in comment.The comment contains escaped quotes
\"no override\"which should be regular quotes.📝 Suggested fix
-// resolvedTimeout represents parsed timeout values for an upstream. -// All fields are optional; nil means \"no override\" (use global defaults). +// resolvedTimeout represents parsed timeout values for an upstream. +// All fields are optional; nil means "no override" (use global defaults). type resolvedTimeout struct {
a97801b to
ee8ba40
Compare
ee8ba40 to
48abc28
Compare
| route_idle_timeout_in_milliseconds = 300000 | ||
|
|
||
| [gateway_controller.router.envoy_upstream_cluster] | ||
| connect_timeout_in_milliseconds = 5000 |
There was a problem hiding this comment.
What is this config
|
As timeout and idle timeout are route-level configs (Downstream), it is not correct to read them from Upstream Definition and configure routes. Shall we remove them then? We may get an issue when we are gonig to support dynamic routing |
| @@ -94,9 +94,12 @@ verify_host_name = true | |||
| disable_ssl_verification = false | |||
|
|
|||
| [gateway_controller.router.envoy_upstream.timeouts] | |||
There was a problem hiding this comment.
| [gateway_controller.router.envoy_upstream.timeouts] | |
| [gateway_controller.router.envoy_downstream.timeouts] |
| max_route_timeout_in_milliseconds = 60000 | ||
| route_idle_timeout_in_milliseconds = 300000 | ||
|
|
||
| [gateway_controller.router.envoy_upstream_cluster] |
There was a problem hiding this comment.
| [gateway_controller.router.envoy_upstream_cluster] | |
| [gateway_controller.router.envoy_upstream.timeout] |
Purpose
This PR adds backend timeout support by introducing upstreamDefinitions under the API spec. APIs can reference a named definition with spec.upstream.main.ref (or sandbox ref); each definition may define a timeout block with optional connect, request, and idle duration strings. When a ref is used, the translator resolves it to the definition’s URL and timeout; if a definition has no timeout (or the upstream uses a direct url), the gateway falls back to the global router config timeouts so existing behaviour is unchanged.
Fix for:
Summary by CodeRabbit
New Features
Configuration Changes