fix: test_k8s_proxy_integration timeout (60s too tight for Docker E2E)#1136
Open
agentydragon wants to merge 2 commits intodevelfrom
Open
fix: test_k8s_proxy_integration timeout (60s too tight for Docker E2E)#1136agentydragon wants to merge 2 commits intodevelfrom
agentydragon wants to merge 2 commits intodevelfrom
Conversation
Root cause: py_test macro defaults size="small" (60s timeout). This Docker E2E test loads 3 OCI images (mitmproxy, mock k8s server, test client) and starts 3 containers. Typical runtime 45-56s, leaving <15s headroom on a 60s budget. On slower RBE workers it crosses 60s and gets killed mid-fixture-setup. Evidence from BuildBuddy (commit 468afda): - 03455c8a: PASSED in 56.4s (4s from timeout) - 5a670acb: TIMEOUT at 60.1s (killed during fixture setup) - adfdf2f2: PASSED in 46.5s - 98941593: PASSED in 46.8s Time budget: ~40s image loading (3x docker load), ~5s container startup/networking, ~1s actual test logic. medium (300s) gives 5x headroom for slow RBE workers. https://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT
Root cause: py_test macro defaults size="small" (60s). This Docker E2E
test takes 43-56s on RBE — 78% spent in sequential docker load calls.
Measured per-step timing via undeclared outputs (timing.log):
Before: 0-13s mitmproxy load, 13-25s mock_k8s load, 26-35s client
load, 35-38s actual test. Total 43.6s.
After: 0-15s parallel mock_k8s+client load, 15-27s mitmproxy
fixture, 27-30s test. Total 37.9s.
docker load benchmark (145MB tarball, this machine):
Raw tar stream: 345ms (IO only)
Gunzip 112MB: 2.7s (CPU bound)
docker load cold: 6.8s (decompress + overlay2 write)
docker load warm: 1.4s (layer exists, digest check only)
Changes:
- size="medium" (300s) for safe headroom
- Parallel docker load for mock_k8s + client images via ThreadPoolExecutor
- module-scoped _preloaded_images fixture (load once, not per test)
- Timing instrumentation writes to undeclared outputs for future diagnosis
https://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT
2538e64 to
47b8e0b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Increase
test_k8s_proxy_integrationfromsize="small"(60s) tosize="medium"(300s).Root cause
The
py_testmacro defaultssize="small"(60s timeout). This Docker E2E test loads 3 OCI images (mitmproxy, mock k8s server, test client) and starts 3 containers. Typical runtime 45-56s — that's 75-93% of the 60s budget. On slower RBE workers it crosses 60s and gets killed mid-fixture-setup, producing zero test output (justcollected 1 itemthen silence).Evidence (BuildBuddy, commit
468afda)03455c8a5a670acbadfdf2f298941593Time budget: ~40s image loading (3x
docker load), ~5s container startup/networking, ~1s actual test logic.Test plan
medium(300s) gives 5x headroom for slow workershttps://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT