Skip to content

fix(celery): bump worker concurrency default to 16#1228

Open
mihow wants to merge 1 commit intomainfrom
fix/celery-worker-concurrency
Open

fix(celery): bump worker concurrency default to 16#1228
mihow wants to merge 1 commit intomainfrom
fix/celery-worker-concurrency

Conversation

@mihow
Copy link
Copy Markdown
Collaborator

@mihow mihow commented Apr 14, 2026

Summary

  • Add explicit CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=16) to config/settings/base.py, next to the existing CELERY_WORKER_PREFETCH_MULTIPLIER / CELERY_WORKER_ENABLE_PREFETCH_COUNT_REDUCTION block.
  • Overridable per deployment via the CELERY_WORKER_CONCURRENCY env var.

Why

The default celery worker concurrency when the setting is unset is os.cpu_count(). On the current production celery worker host (8 cores) this means an 8-process prefork pool. The dominant tasks on the antenna queue — process_nats_pipeline_result and create_detection_images — are DB/Redis-bound rather than CPU-bound: each task spends most of its time waiting on postgres/pgbouncer and Redis round-trips, not crunching numbers.

Direct observation during a high-throughput async_api job:

Raising the prefork pool size directly addresses the bottleneck. 16 is a conservative first step (2× cpu_count, roughly matching the observed room on DB/pgbouncer side). A hotfix override of 16 was applied in production via the env var ahead of this PR and confirmed to drain the backlog on the active jobs.

Why 16 specifically

It is the smallest power-of-2 step that roughly matches the empirical gap between ingress and drain on the production incident that motivated this PR, without risking pgbouncer saturation. Deployments with different DB/pgbouncer capacity can override via env var. A larger default can be considered once we have measured postgres connection-pool headroom (see "what we still need to verify" below).

What this does not change

  • Prefetch multiplier stays at 1 — that was already set and fairness behaviour is unchanged.
  • Routing / queue topology is unchanged. Splitting the antenna queue into a dedicated "ingest fast path" vs "housekeeping / status-check" queue is a larger follow-up, filed separately.
  • Pool class stays prefork. Switching to gevent for this queue may give much higher effective concurrency on an IO-bound workload, but every task on this queue would need to be audited for gevent-safety (blocking C extensions, thread-locals in PyTorch paths, etc.) first. Out of scope here.

What we still need to verify

  • Postgres / pgbouncer connection pool usage after deploy — 16 prefork workers × persistent connections should be well within pgbouncer's default_pool_size, but worth confirming under load.
  • Whether the 16-default is also correct for the smaller staging/demo deployments or whether those want a lower override.
  • Whether this change exposes any new memory-pressure pattern at peak load (current --max-tasks-per-child=100 / --max-memory-per-child=2 GiB already bound each process).

Related

Summary by CodeRabbit

Release Notes

  • New Features
    • Added configurable worker concurrency setting to control parallel background task processing (default: 16 workers).

The default celery worker concurrency (os.cpu_count()) underutilises the
worker pool for process_nats_pipeline_result and create_detection_images,
which are DB/Redis-bound rather than CPU-bound. On a prefork pool sized
to CPU count, the pool is idle most of the time while the antenna queue
backlogs during high-throughput NATS async_api jobs.

Override via CELERY_WORKER_CONCURRENCY env var per deployment; 16 is the
new default.
Copilot AI review requested due to automatic review settings April 14, 2026 17:36
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 14, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit 44dd942
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69de7b20eb95120008e0cd81

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 14, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 44dd942
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69de7b209892d40008b68bbf

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

A new Celery worker concurrency setting was added to the base configuration, enabling control over the prefork pool size via an environment variable with a default value of 16.

Changes

Cohort / File(s) Summary
Celery Configuration
config/settings/base.py
Added CELERY_WORKER_CONCURRENCY environment variable setting with default value of 16 to control Celery prefork pool size.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • carlosgjs

Poem

🐰 A new setting hops into place,
Concurrency tuned with grace,
Sixteen workers, or more if you choose,
Prefork pools that never lose! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a new Celery worker concurrency setting with a default value of 16.
Description check ✅ Passed The PR description covers all required sections: summary, list of changes, related issues, detailed description with motivation, and deployment notes. However, 'How to Test the Changes' and 'Screenshots' sections are missing, and the Checklist is incomplete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/celery-worker-concurrency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the default Celery worker prefork pool size by introducing an explicit CELERY_WORKER_CONCURRENCY setting in the Django base settings, while keeping it overridable per deployment via an environment variable.

Changes:

  • Add CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=16) to config/settings/base.py.
  • Document rationale and override behavior inline next to existing worker prefetch settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
config/settings/base.py (1)

401-401: Consider documenting CELERY_WORKER_CONCURRENCY in env templates/runbooks.

Optional, but adding it to .env.example/deployment docs will make per-environment tuning easier (especially smaller staging/demo stacks).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/settings/base.py` at line 401, Add documentation for the
CELERY_WORKER_CONCURRENCY environment variable (used where
CELERY_WORKER_CONCURRENCY = env.int("CELERY_WORKER_CONCURRENCY", default=16)) to
the project's environment templates and deployment/runbook, e.g., update
.env.example and relevant runbooks to include the variable name, its purpose
(controls Celery worker concurrency), allowed values, and the default of 16,
plus a note recommending smaller values for staging/demo and guidance for tuning
per-environment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@config/settings/base.py`:
- Line 401: Add documentation for the CELERY_WORKER_CONCURRENCY environment
variable (used where CELERY_WORKER_CONCURRENCY =
env.int("CELERY_WORKER_CONCURRENCY", default=16)) to the project's environment
templates and deployment/runbook, e.g., update .env.example and relevant
runbooks to include the variable name, its purpose (controls Celery worker
concurrency), allowed values, and the default of 16, plus a note recommending
smaller values for staging/demo and guidance for tuning per-environment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e68d66df-4cb2-4c7f-af0f-480c5272b1a4

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6be7a and 44dd942.

📒 Files selected for processing (1)
  • config/settings/base.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants