-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL Not Using Correct Index for Scheduler Critical Section Query #25627
Comments
Is this specific to MySQL? |
Yes it is. @michaelmicheal can you update the title to make it clearer we are talking about MySQL. The questions for the airflow maintainers, would you welcome a PR to add the index hint when using mysql? # Pseudo code
if mysql:
query = query.with_hint(TI, 'USE INDEX (ti_state)', dialect_name='mysql') |
I don’t think you need the |
Yes, you are right! |
Apache Airflow version
Other Airflow 2 version
What happened
Airflow Version: 2.2.5
MySQL Version: 8.0.18
In the Scheduler, we are coming across instances where MySQL is inefficiently optimizing the critical section task queuing query. When a large number of task instances are scheduled, MySQL failing to use the
ti_state
index to filter thetask_instance
table, resulting in a full table scan (about 7.3 million rows).Normally, when running the critical section query the index on
task_instance.state
is used to filter scheduledtask_instances
.When a large number of task_instances are in scheduled state at the same time, the index on
task_instance.state
is not being used to filter scheduledtask_instances
.What you think should happen instead
To resolve this, I added a patch on the
scheduler_job.py
file, adding a MySQL index hint to use theti_state
index.I think it makes sense to add this index hint upstream.
How to reproduce
Schedule a large number of dag runs and tasks in a short period of time.
Operating System
Debian GNU/Linux 10 (buster)
Versions of Apache Airflow Providers
No response
Deployment
Other 3rd-party Helm chart
Deployment details
Airflow 2.2.5 on Kubernetes
MySQL Version: 8.0.18
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: