[13.x] Fix queue worker entering infinite loop on persistent pop exceptions#59579
Closed
Avnsh1111 wants to merge 1 commit intolaravel:13.xfrom
Closed
[13.x] Fix queue worker entering infinite loop on persistent pop exceptions#59579Avnsh1111 wants to merge 1 commit intolaravel:13.xfrom
Avnsh1111 wants to merge 1 commit intolaravel:13.xfrom
Conversation
…ptions When getNextJob() catches a non-database Throwable (e.g. SQS timeout, Redis auth failure), stopWorkerIfLostConnection() never matches because it only checks database error strings. The worker silently loops forever: catch → sleep(1) → retry → catch → forever. Workers appear healthy to process supervisors but process zero messages. Adds a consecutive pop failure counter that triggers shouldQuit after 100 failures, allowing the worker to exit gracefully and be restarted by Supervisor/Docker. The counter resets on any successful pop, so transient errors do not accumulate. Fixes laravel#59517
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem
When
Worker::getNextJob()catches aThrowablethat is not a database connection error (e.g. SQS SDK timeout, Redis auth failure, Beanstalkd connection drop), the worker enters an infinite silent loop:stopWorkerIfLostConnection()uses theDetectsLostConnectionstrait which only matches database error strings (MySQL, PostgreSQL, SQLite). Queue infrastructure failures from SQS, Redis, or HTTP-based drivers never trigger a stop condition. The--timeoutflag also cannot help becausepcntl_alarmis only registered aftergetNextJob()returns.The result: workers appear healthy to process supervisors (Docker, Supervisor) but silently process zero messages — potentially for days (as reported in #59517).
The Fix
Adds a lightweight consecutive pop failure counter (
$popFailures) directly ingetNextJob():$this->shouldQuit = trueafter 100 consecutive failuresThis uses the existing
shouldQuit→stopIfNecessary()→ graceful exit path with no new CLI options, no new enum cases, and no new public API surface. The worker exits cleanly with status 0, allowing Supervisor/Docker to restart it.100 consecutive failures (with 1s sleep between each) means the worker tolerates ~1.5 minutes of transient errors before giving up — enough to ride out brief network blips while still catching persistent infrastructure failures.
Benefit to end users
Queue workers will no longer silently become zombies when the queue backend (SQS, Redis, etc.) has a persistent connection issue. They will exit gracefully after sustained failures, allowing process managers to restart them with a fresh connection.
Why this doesn't break existing features
getNextJob()— normal job processing is completely unaffectedlostConnectionexit path (which fires first)WorkerOptions, CLI arguments, or public APITests
Two new tests in
QueueWorkerTest:testWorkerQuitsAfterConsecutivePopFailures— verifies the worker exits after sustained pop exceptions usingBrokenQueueConnectiontestWorkerPopFailureCounterResetsOnSuccess— verifies intermittent failures followed by success don't trigger a quit, using a newIntermittentBrokenQueueConnectionfakeAll 27 existing queue worker tests continue to pass.
Fixes #59517