-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pid check #24636
Fix pid check #24636
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Is it possible to have a test for this? |
I tried applying this on my 2.2.5 install, no improvement on my side. Adding a little print logging of I noticed that when dag code does not have |
I just added a set of tests for this to look at when the |
@kagesenshi, that sounds like a different issue and I haven't seen that one. This specifically addresses when the Task Instance PID is unset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems plausible and very interesting race condition.
@ashb WDYT?
Awesome work, congrats on your first merged pull request! |
(cherry picked from commit 26c9768)
related: #17507
related: #20992
This fixes issues in which running a task with impersonation leads to the error
Recorded pid {PID1} does not match the current pid {PID2}
, which then triggers a SIGTERM call to all processes in the group. The issue appears to be that in some cases the recorded pid (which is also the taskinstance pid) is None, which leads the ensuingpsutils.Process(ti.pid).ppid()
call to return the parent of the current running process instead of the parent of the taskinstance - and the parent is the long running Worker, which is not the desired process to identify.I was able to reproduce this error regularly and separately traced all processes to try to identify what they were. The "current pid" is the task runner, and in most cases the relevant call was so short lived I couldn't even get it in my tracing - which would explain why it was no longer registered to the task instance.