Skip to content

fix: test_k8s_proxy_integration timeout (60s too tight for Docker E2E)#1136

Open
agentydragon wants to merge 2 commits intodevelfrom
claude/fix-k8s-proxy-test-timeout
Open

fix: test_k8s_proxy_integration timeout (60s too tight for Docker E2E)#1136
agentydragon wants to merge 2 commits intodevelfrom
claude/fix-k8s-proxy-test-timeout

Conversation

@agentydragon
Copy link
Copy Markdown
Owner

Summary

Increase test_k8s_proxy_integration from size="small" (60s) to size="medium" (300s).

Root cause

The py_test macro defaults size="small" (60s timeout). This Docker E2E test loads 3 OCI images (mitmproxy, mock k8s server, test client) and starts 3 containers. Typical runtime 45-56s — that's 75-93% of the 60s budget. On slower RBE workers it crosses 60s and gets killed mid-fixture-setup, producing zero test output (just collected 1 item then silence).

Evidence (BuildBuddy, commit 468afda)

Invocation Result Duration
03455c8a PASSED 56.4s
5a670acb TIMEOUT 60.1s
adfdf2f2 PASSED 46.5s
98941593 PASSED 46.8s

Time budget: ~40s image loading (3x docker load), ~5s container startup/networking, ~1s actual test logic.

Test plan

  • Verified the test is not actually wedging — just slow image loading
  • medium (300s) gives 5x headroom for slow workers

https://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT

claude added 2 commits March 31, 2026 01:26
Root cause: py_test macro defaults size="small" (60s timeout). This
Docker E2E test loads 3 OCI images (mitmproxy, mock k8s server, test
client) and starts 3 containers. Typical runtime 45-56s, leaving <15s
headroom on a 60s budget. On slower RBE workers it crosses 60s and
gets killed mid-fixture-setup.

Evidence from BuildBuddy (commit 468afda):
- 03455c8a: PASSED in 56.4s (4s from timeout)
- 5a670acb: TIMEOUT at 60.1s (killed during fixture setup)
- adfdf2f2: PASSED in 46.5s
- 98941593: PASSED in 46.8s

Time budget: ~40s image loading (3x docker load), ~5s container
startup/networking, ~1s actual test logic. medium (300s) gives 5x
headroom for slow RBE workers.

https://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT
Root cause: py_test macro defaults size="small" (60s). This Docker E2E
test takes 43-56s on RBE — 78% spent in sequential docker load calls.

Measured per-step timing via undeclared outputs (timing.log):
  Before:  0-13s mitmproxy load, 13-25s mock_k8s load, 26-35s client
           load, 35-38s actual test. Total 43.6s.
  After:   0-15s parallel mock_k8s+client load, 15-27s mitmproxy
           fixture, 27-30s test. Total 37.9s.

docker load benchmark (145MB tarball, this machine):
  Raw tar stream:   345ms (IO only)
  Gunzip 112MB:     2.7s  (CPU bound)
  docker load cold: 6.8s  (decompress + overlay2 write)
  docker load warm: 1.4s  (layer exists, digest check only)

Changes:
- size="medium" (300s) for safe headroom
- Parallel docker load for mock_k8s + client images via ThreadPoolExecutor
- module-scoped _preloaded_images fixture (load once, not per test)
- Timing instrumentation writes to undeclared outputs for future diagnosis

https://claude.ai/code/session_01ANqoTWWCxF71H5Aq2DqwnT
@agentydragon agentydragon force-pushed the claude/fix-k8s-proxy-test-timeout branch from 2538e64 to 47b8e0b Compare March 31, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants