Replies: 2 comments
-
|
You likely have some problem with the structure of your Dags (many parallel tasks) and scheduling after task execution which takes long time - you can disable "schedule after task execution" configuration - which was intended to be an optimization but it suffers from similar kind of issues in certain cases. This feature has been removed in Airflow 3, and we also highly recommend you to move to Airflow 3 as Airlfow 2.11 is in limited maintenance and will stop receiving patches (even security patches) in April. BTW. We are in a process of testing the RC candidate of 2.11.1 -> see #62056 - we need people like you to test the release candidate and confirm that it works for them - this release contains a few security patches and is likely the last one we release for 2.11, so you have one of the last chances to report any serious/security related issues that we can fix in this release. |
Beta Was this translation helpful? Give feedback.
-
|
Further inputs, [2026-02-18, 11:03:19 IST] {taskinstance.py:2631} INFO - Dependencies all met for dep_context=non-requeueable deps [2026-02-18, 11:03:19 IST] {taskinstance.py:2631} INFO - Dependencies all met for dep_context=requeueable deps [2026-02-18, 11:03:19 IST] {taskinstance.py:2884} INFO - Starting attempt 1 of 2 [2026-02-18, 11:03:19 IST] {taskinstance.py:2907} INFO - Executing <Task(SSHOperator): ingestion_adhoc_test> [2026-02-18, 11:03:19 IST] {standard_task_runner.py:72} INFO - Started process 1660900 to run task [2026-02-18, 11:03:19 IST] {standard_task_runner.py:104} INFO - Running: [2026-02-18, 11:03:19 IST] {standard_task_runner.py:105} INFO - Job 123383: Subtask ingestion_adhoc_test [2026-02-18, 11:03:33 IST] {task_command.py:467} INFO - Running [2026-02-18, 11:03:47 IST] {taskinstance.py:3157} INFO - Exporting env vars: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We have Airflow v2.11.0 setup on on-prem linux(RHEL).
VM-1: Webserver, Scheduler, RabbitMQ, PostgreSQL
VM-2,3: Celary workers. (16vCPUs each)
Note: Previously we were using v2.2.5 and setup new separate cluster with above configuration.
On new setup(v2.11.0) we have been seeing CPU starvation on worker nodes.
Newly introduced task supervisor is holding CPU thread for long time while task runner quickly finish processing.
eg:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1165672 ixxxxxmi+ 20 0 1.0g 0.1g 0.0g R 99.7 0.1 0:43.93 airflow task supervisor: ['airflow', 'tasks', 'run', 'xxx_ingestion_ppp', 'staging_to_refined_xxx_PRELOGIN_xxxx_sa+
With this when load is moderately high 20 odd tasks together,
further tasks started loosing heartbeat for long duration, this has been seen in pre-task logs section of task.
2026-02-11, 15:41:30 IST] {local_task_job_runner.py:123} ▼ Pre task execution logs
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: xxx_ingestion.staging_to_refined__prod_t_user scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2631} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: _ingestionet.staging_to_refined__prod_t_user scheduled__2026-02-09T22:00:00+00:00 [queued]>
[2026-02-11, 15:42:16 IST] {taskinstance.py:2884} INFO - Starting attempt 2 of 2
[2026-02-11, 15:42:16 IST] {taskinstance.py:2907} INFO - Executing <Task(SSHOperator): staging_to_refined_prod_t_user> on 2026-02-09 22:00:00+00:00
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:72} INFO - Started process 45475 to run task
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:104} INFO - Running: ['airflow', 'tasks', 'run', 'xxx_ingestion_xxx', 'staging_to_refinebr_prod_t_user', 'scheduled__2026-02-09T22:00:00+00:00', '--job-id', '117940', '--raw', '--subdir', 'DAGS_FOLDER/_ingestion.py', '--cfg-path', '/tmp/tmpllomvg1s']
[2026-02-11, 15:42:16 IST] {standard_task_runner.py:105} INFO - Job 117940: Subtask staging_to_refinedt_prod_t_user
[2026-02-11, 15:46:54 IST] {task_command.py:467} INFO - Running <TaskInstance: _ingestion.staging_to_refined_prod_t_user scheduled__2026-02-09T22:00:00+00:00 [running]> on host appprr09.idfcbank.com
[2026-02-11, 15:51:11 IST] {job.py:229} INFO - Heartbeat recovered after 784.55 seconds
[2026-02-11, 15:53:45 IST] {job.py:229} INFO - Heartbeat recovered after 153.75 seconds
[2026-02-11, 15:58:20 IST] {taskinstance.py:3157} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='ingestion' AIRFLOW_CTX_TASK_ID='staging_to_refined__prod_t_user' AIRFLOW_CTX_EXECUTION_DATE='2026-02-09T22:00:00+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2026-02-09T22:00:00+00:00'
[2026-02-11, 15:58:20 IST] {taskinstance.py:740} ▲▲▲ Log group end
Requesting assistance and guidance on same.
airflow v2.2.5, use to handle more than 50 task with same configuration.
Beta Was this translation helpful? Give feedback.
All reactions