-
-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden pantsd memory usage restarts #11618
Comments
Does Pants put the child processes into their own process group? If not, might be good to do so because |
(or a process group per graph run or some other reasonable grouping) |
Maybe the exiting instance should unbind the endpoint and gracefully exit in the background while the new instance of |
It looks like this will also (unsurprisingly) cause a run to exit before the |
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
As described in #11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short: 1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd exits. 2. async StreamingWorkunitHandler threads might still be running. This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for. Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from #11618. Work for #11831 will likely further harden this path. [ci skip-build-wheels]
Fixed by #11929. |
As described in pantsbuild#11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short: 1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd exits. 2. async StreamingWorkunitHandler threads might still be running. This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for. Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from pantsbuild#11618. Work for pantsbuild#11831 will likely further harden this path. [ci skip-build-wheels]
…11934) As described in #11618, when `pantsd` intentionally exits due to low memory, a few types of work can be cut short: 1. if the run ends in Ctrl+C, processes that were cancelled may not have had time to be dropped before `pantsd` exits. 2. async StreamingWorkunitHandler threads might still be running. This change adds orderly-shutdown mechanisms to the `Scheduler`/`Core` to join all ongoing `Sessions` (including the SWH), and improves tests to ensure that the SWH is waited for. Additionally, in the last commit, added purging of the `pantsd` metadata as soon as we decide to restart, which should reduce (but probably not eliminate) the incidence of item 1. from #11618. Work for #11831 will likely further harden this path. [ci skip-build-wheels]
When
pantsd
intentionally exits due to low memory, a few different issues have been observed.exceeded timeout of 60 seconds while waiting for pantsd to start
" error.pantsd
has already decided to exit and a run is canceled due toCtrl+C
,pantsd
shutdown will not necessarily wait long enough for cancellation to have propagated through theGraph
: the call tosys.exit
inpantsd
preventsDrop
from necessarily running for theGraph
, leading to orphaned processes (which would otherwise be killed by theirDrop
handlers).pantsd
exits, it does not wait forStreamingWorkunitHandler
threads to complete, which means that callbacks which allow async completion can be cut off by the restart.The text was updated successfully, but these errors were encountered: