Skip to content

libpod: fix data race on deferredErr in attachExecHTTP#28365

Merged
Honny1 merged 1 commit intocontainers:mainfrom
crawfordxx:fix-race-condition-in-hijackWriteErrorAndClose
Mar 31, 2026
Merged

libpod: fix data race on deferredErr in attachExecHTTP#28365
Honny1 merged 1 commit intocontainers:mainfrom
crawfordxx:fix-race-condition-in-hijackWriteErrorAndClose

Conversation

@crawfordxx
Copy link
Copy Markdown
Contributor

Description

ExecContainerHTTP returns attachChan to its caller before attachExecHTTP has finished. The caller holds a deferred close(holdConnOpen) that fires when the calling function returns — which can happen while attachExecHTTP is still running through its own deferred cleanup functions.

This creates a data race: the goroutine inside attachExecHTTP reads the named return deferredErr after <-holdConnOpen unblocks, but attachExecHTTP may still be assigning deferredErr (directly or via deferred functions) at that moment. Concurrent access to an interface value without synchronization is unsound per the Go memory model and can produce torn reads (non-nil type with nil data pointer), leading to the nil-pointer panic observed in CI:

panic: runtime error: invalid memory address or nil pointer dereference
net.(*OpError).Unwrap(0x2c31560?)
errors.is({0x1f6eee0?, 0x0?}, ...)   ← interface type set, data nil
github.com/containers/podman/v6/libpod.hijackWriteError(...)
github.com/containers/podman/v6/libpod.attachExecHTTP.func3()

Fix

Add a done channel that is closed by the first-registered (last-to-run) deferred function in attachExecHTTP, i.e. after every other cleanup defer has completed. The goroutine now waits for both <-holdConnOpen and <-done before reading deferredErr. This establishes a proper happens-before relationship between the final write to deferredErr and the goroutine's read.

Fixes #28277

Testing

The race is intermittent and only reproducible under CI load. The fix is provably correct by Go memory model analysis: close(done) happens-after the last write to deferredErr, and <-done happens-before the goroutine's read.

@Honny1
Copy link
Copy Markdown
Member

Honny1 commented Mar 26, 2026

Thanks for the contribution! The fix looks correct to me. But I have a question: Did you consider using a sync.Mutex to protect deferredErr instead of a channel? Could you explain why you went with the channel approach?

Adding No New Test Label.

@Honny1 Honny1 added the No New Tests Allow PR to proceed without adding regression tests label Mar 26, 2026
@crawfordxx
Copy link
Copy Markdown
Contributor Author

The channel approach fits the problem more naturally here. is written by multiple deferred functions in (directly, and through ) — protecting those with a would mean adding lock/unlock around every assignment site scattered across the deferred stack, which is fragile.

More importantly, what we actually need is not mutual exclusion but completion signaling: the goroutine should not read until all cleanup code in has finished running. A channel closed by the first-registered (last-to-run) defer precisely expresses that: close(done) happens-after every other deferred write, and <-done happens-before the read. This directly maps to the Go memory model's happens-before guarantee we need.

A sync.WaitGroup would also work, but it is more verbose for a single-event signal. A sync.Mutex would work too, but it would require touching every write site and doesn't convey the intent as clearly.

Copy link
Copy Markdown
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

// cleanup). The goroutine below waits on done before reading deferredErr
// to avoid a data race: holdConnOpen may be closed by the caller while
// this function's deferred functions are still running.
done := make(chan struct{})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a chan to pass the error instead? So it is clearer what we are waiting for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — updated in the latest commit to use errCh := make(chan error, 1) with defer func() { errCh <- deferredErr }(). The goroutine now reads <-errCh directly instead of the two-step <-done + reading the named return variable, which makes the intent clearer.

Copy link
Copy Markdown
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Honny1
Copy link
Copy Markdown
Member

Honny1 commented Mar 28, 2026

LGTM! Once you fix the DCO and squash commits, we can merge.

ExecContainerHTTP returns attachChan to its caller before attachExecHTTP
finishes. The caller's deferred close(holdConnOpen) can therefore fire
while attachExecHTTP is still running (including its own deferred
cleanups), which races with the goroutine inside attachExecHTTP that
reads deferredErr after <-holdConnOpen unblocks.

Fix this by introducing an errCh channel: deferred functions write
deferredErr into it, and the goroutine reads from it after
<-holdConnOpen unblocks, ensuring the read happens-after all writes to
the named return value.

Fixes: containers#28277
Signed-off-by: crawfordxx <crawfordxx@users.noreply.github.com>
@crawfordxx crawfordxx force-pushed the fix-race-condition-in-hijackWriteErrorAndClose branch from c40dc62 to 497185c Compare March 29, 2026 04:04
@Honny1 Honny1 merged commit b2f7205 into containers:main Mar 31, 2026
81 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

No New Tests Allow PR to proceed without adding regression tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hijackWriteErrorAndClose use in podman service is unsound

3 participants