Apache Airflow sometimes lets tasks become zombie tasks. A zombie task is a task that Airflow's schedule believes is running but, when it checks on its status, determines that it has terminated or is no longer running. Airflow may correct the state of a zombie task to be complete or failed. However, Airflow subsequently fails to execute downstream or queued tasks from executing.
Diagnosis
The Airflow Admin UI can be used to determine if a DAG run has been affected by zombie tasks that prevent queued/downstream tasks from executing.
The hallmark of a DAG run that is affected by zombie tasks is that:
- The DAG run is still running
- Not tasks in the DAG are running
- There exists queued/downstream tasks waiting to execute
You can visually inspect the Airflow Admin UI to diagnose the problem.
In the Airflow Admin UI, inspect the state of the "ingest" DAG run:
Here, Airflow believes that the DAG is running (1). However, there are no running tasks (2). This DAG run has been halted by zombie tasks, which prevented queued tasks from being executed once Airflow corrected the state of the zombie tasks. Note that there is one task that is queued (in grey border) but no running tasks.
Root cause
Zombie tasks and their effects on a DAG run are known defects in Apache Airflow. The Airflow project has not yet released a fix for this bug.
Solution
The only way to solve this problem is to work around the Airflow defect by clearing the Zombie tasks. Please follow our guide to clearing tasks in Airflow.