libpod: fix data race on deferredErr in attachExecHTTP#28365
Conversation
|
Thanks for the contribution! The fix looks correct to me. But I have a question: Did you consider using a
|
|
The channel approach fits the problem more naturally here. is written by multiple deferred functions in (directly, and through ) — protecting those with a would mean adding lock/unlock around every assignment site scattered across the deferred stack, which is fragile. More importantly, what we actually need is not mutual exclusion but completion signaling: the goroutine should not read until all cleanup code in has finished running. A channel closed by the first-registered (last-to-run) defer precisely expresses that: A |
libpod/oci_conmon_exec_common.go
Outdated
| // cleanup). The goroutine below waits on done before reading deferredErr | ||
| // to avoid a data race: holdConnOpen may be closed by the caller while | ||
| // this function's deferred functions are still running. | ||
| done := make(chan struct{}) |
There was a problem hiding this comment.
can we use a chan to pass the error instead? So it is clearer what we are waiting for
There was a problem hiding this comment.
Good point — updated in the latest commit to use errCh := make(chan error, 1) with defer func() { errCh <- deferredErr }(). The goroutine now reads <-errCh directly instead of the two-step <-done + reading the named return variable, which makes the intent clearer.
|
LGTM! Once you fix the DCO and squash commits, we can merge. |
ExecContainerHTTP returns attachChan to its caller before attachExecHTTP finishes. The caller's deferred close(holdConnOpen) can therefore fire while attachExecHTTP is still running (including its own deferred cleanups), which races with the goroutine inside attachExecHTTP that reads deferredErr after <-holdConnOpen unblocks. Fix this by introducing an errCh channel: deferred functions write deferredErr into it, and the goroutine reads from it after <-holdConnOpen unblocks, ensuring the read happens-after all writes to the named return value. Fixes: containers#28277 Signed-off-by: crawfordxx <crawfordxx@users.noreply.github.com>
c40dc62 to
497185c
Compare
Description
ExecContainerHTTPreturnsattachChanto its caller beforeattachExecHTTPhas finished. The caller holds a deferredclose(holdConnOpen)that fires when the calling function returns — which can happen whileattachExecHTTPis still running through its own deferred cleanup functions.This creates a data race: the goroutine inside
attachExecHTTPreads the named returndeferredErrafter<-holdConnOpenunblocks, butattachExecHTTPmay still be assigningdeferredErr(directly or via deferred functions) at that moment. Concurrent access to an interface value without synchronization is unsound per the Go memory model and can produce torn reads (non-nil type with nil data pointer), leading to the nil-pointer panic observed in CI:Fix
Add a
donechannel that is closed by the first-registered (last-to-run) deferred function inattachExecHTTP, i.e. after every other cleanup defer has completed. The goroutine now waits for both<-holdConnOpenand<-donebefore readingdeferredErr. This establishes a proper happens-before relationship between the final write todeferredErrand the goroutine's read.Fixes #28277
Testing
The race is intermittent and only reproducible under CI load. The fix is provably correct by Go memory model analysis:
close(done)happens-after the last write todeferredErr, and<-donehappens-before the goroutine's read.