fix: fix resource leaks — response bodies, signals, connections, file descriptors, data structures#7094
fix: fix resource leaks — response bodies, signals, connections, file descriptors, data structures#7094
Conversation
Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
|
@greptile |
Greptile SummaryThis PR systematically addresses resource leaks across the codebase, covering HTTP response bodies, signal handlers, goroutines, TCP connections, file descriptors, and data structures. The changes are well-focused and follow Go best practices for resource cleanup. Key improvements:
All fixes are correct and follow Go idioms. No issues found. Confidence Score: 5/5
Important Files Changed
Last reviewed commit: adf422a |
Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
…eak in RunInit) Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
|
@greptile |
There was a problem hiding this comment.
Pull request overview
This PR targets resource-leak fixes across the Go codebase, primarily around properly releasing OS/HTTP resources and cleaning up signal/ticker handling to avoid leaked goroutines and blocked channels during shutdown.
Changes:
- Close leaked resources (files, HTTP response bodies) and stop tickers to prevent FD/goroutine leaks.
- Add
signal.Stop(...)tosignal.Notify(...)call sites and adjust signal-handling goroutines to respectctx.Done(). - Improve file download write-paths by ensuring created files are always closed (including close-error propagation in some cases).
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utilization/core/writers.go | Ensure opened file is closed in Print() to avoid FD leaks. |
| pkg/triggers/scraper.go | Stop time.Ticker on exit to avoid ticker resource leak. |
| pkg/runner/executionlogswriter.go | Close HTTP response body after presigned PUT request. |
| pkg/mcp/server.go | Stop signal notifications for graceful shutdown handlers. |
| pkg/event/emitter.go | Stop tickers in long-running loops to avoid ticker leaks. |
| pkg/controlplaneclient/execution.go | Close HTTP response body after presigned PUT request. |
| pkg/cloud/data/artifact/artifacts_storage.go | Close response body on non-200 in getObject(). |
| pkg/cloud/client/rest.go | Close HTTP response bodies in REST client methods. |
| pkg/cloud/client/agents.go | Close HTTP response body in GetSecretKey(). |
| pkg/api/v1/client/direct_client.go | Close body on error path in baseExec(); ensure created file is closed in GetFile(). |
| pkg/api/v1/client/cloud_client.go | Ensure created file is closed (with close-error propagation) in GetFile(). |
| cmd/testworkflow-toolkit/commands/root.go | Add signal.Stop(stopSignal) to signal handler goroutine. |
| cmd/testworkflow-init/runner/runner.go | Add signal.Stop and make signal goroutine exit on ctx.Done(). |
| cmd/tcl/kubectl-testkube/devbox/devutils/binarystorage.go | Close HTTP response bodies on additional paths in upload logic. |
| cmd/tcl/kubectl-testkube/devbox/command.go | Add signal.Stop and make signal goroutine exit on ctx.Done(). |
| cmd/tcl/devbox-binary-storage/main.go | Close opened file in HTTP GET handler. |
| cmd/kubectl-testkube/commands/testworkflowtemplates/create.go | Close opened input file for --file mode. |
| cmd/kubectl-testkube/commands/testworkflows/create.go | Close opened input file for --file mode. |
| cmd/kubectl-testkube/commands/root.go | Add signal.Stop(stopSignal) to signal handler goroutine. |
| cmd/kubectl-testkube/commands/pro/login.go | Close HTTP response bodies after reading public-info. |
| cmd/kubectl-testkube/commands/dashboard.go | Stop signal notifications for dashboard port-forward stop channel. |
| cmd/api-server/commons/commons.go | Add signal.Stop(stopSignal) in HandleCancelSignal. |
Comments suppressed due to low confidence (3)
cmd/api-server/commons/commons.go:75
defer signal.Stop(stopSignal)runs when this handler returns (including immediately after the first signal), which disables further signal delivery. That prevents the nested goroutine waiting for a second signal from ever unblocking, so the intended forcedos.Exit(137)on repeated signals won’t work. Consider stopping notifications only after shutdown completes (e.g., defersignal.Stopat the call site that owns the lifecycle).
defer signal.Stop(stopSignal)
select {
case <-ctx.Done():
return nil
case sig := <-stopSignal:
cmd/testworkflow-toolkit/commands/root.go:57
defer signal.Stop(stopSignal)is inside the errgroup goroutine. When the first signal is received, this goroutine returns andsignal.Stopruns immediately, which prevents the nested “second signal => os.Exit(137)” goroutine from ever receiving a second signal (it will block forever on<-stopSignal). Movesignal.Stop(stopSignal)to adeferinExecute()(outside the goroutine), or otherwise ensure notifications stay active until the process exits/shutdown completes.
defer signal.Stop(stopSignal)
select {
case <-ctx.Done():
return nil
case sig := <-stopSignal:
cmd/kubectl-testkube/commands/root.go:218
defer signal.Stop(stopSignal)is executed when this errgroup goroutine returns (e.g., right after receiving the first signal). That stops further signal delivery and breaks the intended “second SIGINT/SIGTERM forces os.Exit(137)” behavior (the inner goroutine will block on<-stopSignal). Prefer deferringsignal.Stop(stopSignal)inExecute()itself, after the command finishes, so signal delivery remains active during shutdown.
defer signal.Stop(stopSignal)
select {
case <-ctx.Done():
return nil
case sig := <-stopSignal:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ctx, ctxCancel := context.WithCancel(context.Background()) | ||
| stopSignal := make(chan os.Signal, 1) | ||
| signal.Notify(stopSignal, syscall.SIGINT, syscall.SIGTERM) | ||
| defer signal.Stop(stopSignal) | ||
| go func() { | ||
| <-stopSignal | ||
| ctxCancel() | ||
| select { | ||
| case <-stopSignal: | ||
| ctxCancel() | ||
| case <-ctx.Done(): | ||
| } |
There was a problem hiding this comment.
ctxCancel is only invoked on a signal, so if the command exits normally the context is never canceled and the signal-wait goroutine can remain blocked (even with the new select { ... case <-ctx.Done() }). Add defer ctxCancel() right after context.WithCancel(...) so the goroutine can always exit and any context-bound work is properly canceled.
There was a problem hiding this comment.
Added defer ctxCancel() right after context.WithCancel(...) in adf422a.
| return errors.Wrap(err, "failed to save file in the object storage") | ||
| } | ||
| defer res.Body.Close() | ||
| if res.StatusCode != http.StatusOK { |
There was a problem hiding this comment.
On non-200 responses this returns without draining the response body. Closing without reading to EOF prevents HTTP connection reuse in Go’s net/http transport. Consider reading/discarding the body (and optionally including a small response snippet in the error) before returning.
| if res.StatusCode != http.StatusOK { | |
| if res.StatusCode != http.StatusOK { | |
| // Drain the response body to allow HTTP connection reuse. | |
| _, _ = io.Copy(io.Discard, res.Body) |
There was a problem hiding this comment.
Added io.Copy(io.Discard, res.Body) before the error return in adf422a.
| return errors.Wrap(err, "failed to save file in cloud storage") | ||
| } | ||
| defer res.Body.Close() | ||
| if res.StatusCode != http.StatusOK { |
There was a problem hiding this comment.
On non-200 responses this returns after defer res.Body.Close() but without draining the body. In Go net/http, not reading to EOF typically prevents the underlying TCP connection from being reused. Consider draining/reading the response body before returning the status error.
| if res.StatusCode != http.StatusOK { | |
| if res.StatusCode != http.StatusOK { | |
| // Drain response body to allow the underlying TCP connection to be reused. | |
| _, _ = io.Copy(io.Discard, res.Body) |
There was a problem hiding this comment.
Added io.Copy(io.Discard, res.Body) before the error return in adf422a.
| if rsp.StatusCode != http.StatusOK { | ||
| _ = rsp.Body.Close() |
There was a problem hiding this comment.
When StatusCode != 200, the body is closed without being drained/read. In Go’s net/http this can inhibit connection reuse and also discards potentially useful error details from the response. Consider reading/discarding the body (possibly with a reasonable size limit) before closing and returning the error.
| if rsp.StatusCode != http.StatusOK { | |
| _ = rsp.Body.Close() | |
| if rsp.StatusCode != http.StatusOK { | |
| const maxErrBodyBytes = 4 * 1024 | |
| bodyBytes, _ := io.ReadAll(io.LimitReader(rsp.Body, maxErrBodyBytes)) | |
| _ = rsp.Body.Close() | |
| bodyMsg := strings.TrimSpace(string(bodyBytes)) | |
| if bodyMsg != "" { | |
| return nil, errors.Errorf("error getting file from presigned url: expected 200 OK response code, got %d: %s", rsp.StatusCode, bodyMsg) | |
| } |
There was a problem hiding this comment.
Now reads up to 4KB of the error body via io.LimitReader and includes it in the error message when non-empty. Fixed in adf422a.
…on-deferred file close) Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
|
@greptile |
…SameMap inverted logic) Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
|
@greptile |
|
@copilot please recheck all above with Opus model |
…add defer ctxCancel() Co-authored-by: vsukhin <5984962+vsukhin@users.noreply.github.com>
Rechecked all review comments and addressed each one in adf422a:
|
|
@greptile |
|
@vsukhin ready to merge? |
No, some of them look safe, but other need to be carefully checked |
mgazza
left a comment
There was a problem hiding this comment.
LGTM — comprehensive resource leak sweep. All fixes are correct. Notable: the containsSameMap fix is a genuine logic bug (was returning true on mismatch instead of continuing). InMemoryCache flush, ticker stops, signal stops, resp.Body closes, and file handle closes are all solid. Good companion to cloud-api#3136.
Systematic audit and fix of resource leaks across the codebase: unclosed HTTP response bodies, unregistered signal channels, goroutine leaks, TCP connection leaks, file descriptor leaks, and data structures that grow without bound.
Pull request description
HTTP response body leaks
pkg/runner/executionlogswriter.go—res.Bodynever closed afterhttpClient.Do()inSave(); now drains body on non-200 viaio.Copy(io.Discard, ...)to allow HTTP connection reusepkg/cloud/data/artifact/artifacts_storage.go—rsp.Bodynot closed on non-200 status ingetObject(); now reads up to 4KB of error body viaio.LimitReaderand includes it in the error message for better diagnosticspkg/api/v1/client/direct_client.go—resp.Bodyleaked inbaseExec()on 4xx/5xx; callers early-return beforedefer resp.Body.Close()is setpkg/controlplaneclient/execution.go— now drains response body on non-200 viaio.Copy(io.Discard, ...)before returning to enable TCP connection reusesignal.Notifywithoutsignal.StopAdded
defer signal.Stop(ch)in all seven call sites:pkg/mcp/server.go,cmd/api-server/commons/commons.go,cmd/testworkflow-toolkit/commands/root.go,cmd/kubectl-testkube/commands/root.go,cmd/kubectl-testkube/commands/dashboard.go,cmd/tcl/kubectl-testkube/devbox/command.goGoroutine leaks in signal handlers
cmd/testworkflow-init/runner/runner.goandcmd/tcl/kubectl-testkube/devbox/command.gohad goroutines blocked on<-sigChwith noctx.Done()exit path — leaked whenever the outer function returned normally without a signal. Changed toselect { case <-sigCh: … case <-ctx.Done(): }. Also addeddefer ctxCancel()in devboxcommand.goto ensure the context is always cancelled on function exit, allowing the signal-wait goroutine to terminate.External connection leaks
cmd/testworkflow-init/control/client.go—Close()only calledc.cancel()(cancels dial context, does not close an already-established TCP connection); added_ = c.conn.Close()pkg/tmp/files.go—os.CreateTempFD leaked; added_ = tmpfile.Close()before returning the pathcmd/tcl/kubectl-testkube/devbox/devutils/crdfssync.go— barefile.Close()at end ofloadFile(); converted todeferData structure leaks
pkg/cache/inmem.go—InMemoryCacheentries stored bySet()but never re-fetched viaGet()accumulated in thesync.Mapforever (no active eviction). Addedflush()triggered every 10Set()calls using an atomic counter; uses collect-then-delete to avoid mutating the map duringRange:pkg/testworkflows/testworkflowexecutor/testworkflowfetcher.go—containsSameMaphad inverted boolean logic:return truefired on key mismatch instead of match. TheprefetchedLabelsdedup guard never worked, appending duplicates on every call. Fixed with a localequalflag andbreak-on-mismatch.Checklist (choose whats happened)
Breaking changes
Changes
signal.Stopfor everysignal.Notifycall sitectx.Done()defer ctxCancel()in devbox command to ensure context cleanup on normal exitcontrol.Client.Close()now closes the underlying TCP connectiontmp.Name()closes the temp file descriptor before returningcrdfssync.loadFileusesdefer file.Close()InMemoryCache.Settriggers periodic sweep of expired entriescontainsSameMaplogic corrected to returntrueon matchFixes
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.