Skip to content

fix(pool): guard Pool.renew against out-of-range addrIndex after pool reset#154

Closed
SAY-5 wants to merge 2 commits intoqdm12:v2.0.0-betafrom
SAY-5:fix/pool-renew-addrindex-oob-gluetun-3292
Closed

fix(pool): guard Pool.renew against out-of-range addrIndex after pool reset#154
SAY-5 wants to merge 2 commits intoqdm12:v2.0.0-betafrom
SAY-5:fix/pool-renew-addrindex-oob-gluetun-3292

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented Apr 23, 2026

Downstream crash reported at qdm12/gluetun#3292.

Pool.renew dials outside the mutex and then reacquires the lock and does p.addrConns[conn.addrIndex] without checking that the pool hadn't been reset while the dial was in flight. When the upstream TLS DNS target (Cloudflare 1.0.0.1:853 / 1.1.1.1:853) starts resetting connections repeatedly, the pool can be emptied and recreated between when the caller grabs addrIndex and when renew retakes the lock; the deref then panics:

panic: runtime error: index out of range [0] with length 0
pool.(*Pool).renew internal/pool/renew.go:66
pool.(*Pool).Get internal/pool/get.go:68

This kills the DNS server goroutine and takes gluetun + every container sharing its network namespace down with it (crash loop).

Fix

Bounds-check conn.addrIndex against len(p.addrConns) while the lock is held. On failure, close the freshly-dialed netConn (so we don't leak a live socket) and return an error so the caller falls back to its retry path instead of crashing the process.

Test

go test ./internal/pool/... behaves identically to main on this machine — a pre-existing darwin-only "can't assign requested address" vs "connection refused" message mismatch fails without my change too; every other subtest continues to pass.

qdm12 and others added 2 commits April 21, 2026 17:39
… reset

Pool.renew dialed outside the mutex and then reacquired the lock and
did p.addrConns[conn.addrIndex] without checking that the pool hadn't
been reset while the dial was in flight. When the upstream TLS DNS
target (Cloudflare) starts resetting connections repeatedly, the pool
can be emptied and recreated between when the caller grabs the
addrIndex and when renew retakes the lock; the deref then panicked
the DNS server goroutine with 'index out of range [0] with length 0',
taking gluetun and every container sharing its network namespace
down with it.

Bounds-check conn.addrIndex against len(p.addrConns) while the lock
is held. On failure, close the freshly-dialed netConn (to avoid
leaking a live socket) and return an error so the caller falls back
to its retry path instead of crashing the process.

Refs qdm12/gluetun issue 3292.

Signed-off-by: SAY-5 <say.apm35@gmail.com>
@qdm12
Copy link
Copy Markdown
Owner

qdm12 commented Apr 23, 2026

Did this still happen using the gluetun latest image post qdm12/gluetun@792a5ff ? If so can you share the panic message?

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 1, 2026

Re-tested against qmcgaw/gluetun:latest (ships 792a5ff). The original panic is gone — the resolver path that previously dereferenced a nil DoH client now bails cleanly. Closing on my side; thanks.

@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 1, 2026

Closing per my last comment — the underlying panic is no longer reproducible against gluetun ; thanks @qdm12 for the pointer.

@SAY-5 SAY-5 closed this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants