-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully shutdown child processes #393
Conversation
The u"" literal triggers an internal error with black 19.3b0. While strictly unrelated to this feature branch, we fix it here because the internal error only reproduces on this branch. ❯ nox -s lint nox > Running session lint nox > Creating virtual environment (virtualenv) using python3.8 in .nox/lint nox > python -m pip install flake8==3.7.8 black==19.3b0 isort==4.3.21 mypy==0.720 nox > mypy --disallow-untyped-defs --warn-unused-ignores --ignore-missing-imports nox nox > black --check nox tests noxfile.py setup.py error: cannot format …/nox/tests/test_command.py: INTERNAL ERROR: Black produced code that is not equivalent to the source. Please report a bug on https://github.com/ambv/black/issues. This diff might be helpful: /tmp/blk_d4e49p90.log All done! 💥 💔 💥 38 files would be left unchanged, 1 file would fail to reformat. nox > Command black --check nox tests noxfile.py setup.py failed with exit code 123 nox > Session lint failed. ❯ cat /tmp/blk_d4e49p90.log --- src +++ dst @@ -2092,18 +2092,18 @@ value= Dict( keys= Constant( kind= - 'u', # str + None, # NoneType value= 'SIGIL', # str ) # /Constant values= Constant( kind= - 'u', # str + None, # NoneType value= '123', # str ) # /Constant ) # /Dict ) # /keyword
The Windows CI fails due to an unexpected I have been able to reproduce this on a Windows system. In fact, both of the A possible explanation for this is an SO answer by @eryksun:
The test calls So this means that we send the ^C to ourselves as well, because the parent process is in the same process group as the child. At first, this sounds like we might be able to fix this by passing My next step was to paste the Python programs from the tests into a Noxfile, and trigger an actual ^C from the console. There was no spurious
This another loose end that I haven't had time yet to investigate. |
833d44f fixes the spurious interrupts in the test on Windows, leveraging the SO post quoted above. Unfortunately, even though all Nox sessions succeed in CI now, Nox mysteriously exits with 1. Haven't been able to reproduce this locally, on my test machine Nox exits with success. |
This looks good to me so far, let me know if you get stuck on anything or need some help deciding on something. |
72f7cd2 fixes the spurious interrupts in the Windows CI. While Nox itself exited with success, its parent process (the one running the CI step) exited with failure. This happened because as a console process, the parent also received the keyboard interrupt triggered by the test. The fix was to simply not send a real interrupt when we're on Windows CI. This means we have to skip the test where the child handles the interrupt ( So this finally ready for review. Sidenote:
This turned out to be not entirely true; what happened was that Nox never called Taking a step back, there is no reason Nox has to guarantee termination of a |
I'm not familiar with your specific needs, but here's some general advice. If you need to test the effect of sending You can spawn the test in a new console session, allocated with the process creation flag
ctypes wrapper functions: import signal
import ctypes
kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
def enable_ctrl_c():
if not kernel32.SetConsoleCtrlHandler(None, False):
raise ctypes.WinError(ctypes.get_last_error())
def disable_ctrl_c():
if not kernel32.SetConsoleCtrlHandler(None, True):
raise ctypes.WinError(ctypes.get_last_error())
def send_ctrl_event(pgid=0, ctrl_event=signal.CTRL_C_EVENT):
if not kernel32.GenerateConsoleCtrlEvent(ctrl_event, pgid):
raise ctypes.WinError(ctypes.get_last_error()) |
Thank you @eryksun for taking the time to comment here. Your comment, as well as your posts on SO and bpo, were tremendously helpful! dd9f3c2 implements the solution proposed above: On Windows, the keyboard interrupt tests are run in a subprocess that is not attached to the current console window. This means that all tests now run on Windows as well, both locally and in CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM!
skip_on_windows_console = pytest.mark.skipif( | ||
platform.system() == "Windows" and "DETACHED_FROM_CONSOLE" not in os.environ, | ||
reason="Do not run this test attached to a Windows console.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording here is possibly misleading. CREATE_NO_WINDOW
and CREATE_NEW_CONSOLE
run the process attached to a new console session. The creation flag DETACHED_PROCESS
, on the other hand, tells the system to spawn a process detached from any console session. A truly detached process is definitely not wanted here since the test needs to generate and receive a console control event.
FYI, in Windows 8.1+, a console session is a set of processes that are connected to an instance of conhost.exe by way of the ConDrv kernel device. The session window is optional and may be excluded because of CREATE_NO_WINDOW
or, in Windows 10, because the console host is running in headless conpty mode (pseudoconsole). For example, each tab in Windows Terminal uses a headless conpty session, with an open-source build of "conhost.exe" that's named "openconsole.exe".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing that out. d6b0e59 changes the naming, hopefully for the better 😉
@theacodes Thanks for the approval! I pushed two more commits: |
Cool, still looks good to me. Feel free to merge when you and @eryksun are happy. :) |
Oh, I noticed you don't have merge permissions. Feel free to ping me when you're ready to merge (and if you want merge rights, let me know!) |
This is good to go from my side.
Thanks for offering this, I'd be honored :) |
Done. I'll let you merge this :) 🎆 This PR is eligible for $50 from our open collective. Feel free to submit an expense for that if you'd like. |
Huzzah!
…On Thu, Mar 4, 2021 at 2:04 PM Claudio Jolowicz ***@***.***> wrote:
Merged #393 <#393> into main.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#393 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB5I45ZD4VIUUMFWW4Y2MDTB7KUTANCNFSM4YOYUTVA>
.
|
Fixes #139
This PR changes Nox's behavior when receiving a keyboard interrupt (CTRL-C,
SIGINT
) while executing a command withsession.run
orsession.install
.The current behavior is to terminate the child immediately (
SIGTERM
on Posix,TerminateProcess
on Windows) and to reraise theKeyboardInterrupt
, ultimately exiting with failure.The new behavior is to shut down the child gracefully and only raise if the child did not exit with success. The shutdown sequence consists of these four steps:
SIGTERM
and check again, with a 200ms timeout. If it did, skip to the last step.SIGKILL
and wait without timeout.KeyboardInterrupt
. Otherwise, we consider the keyboard interrupt handled and return normally.A few points of note:
We should probably move
logger.error("Interrupted...")
fromcommand.run
topopen.popen
, so the user gets immediate feedback before the shutdown sequence takes course.We'd need to passWhat do you think?logger
topopen.popen
, though.The timeouts correspond to tox's defaults for its
interrupt_timeout
andterminate_timeout
settings. For commands that need longer to shut down, it may be a good idea to make this configurable in Nox as well. These could be keyword arguments forsession.run
. I don't think this needs to be addressed by this PR though, right? (Interestingly, tox4 no longer seems to have these configurable.)tox does the shutdown slightly differently. It first checks if the child has terminated (with a timeout of zero), and sends a duplicate
SIGINT
if it hasn't. I believe the reason is that the shutdown protocol is also used in cases where there was no initialSIGINT
. The rest of the shutdown sequence is the same. tox always reraises the interrupt, even if the child exited with success. (tox4 essentially behaves like tox3, but busy-waits and skips step 3 on Windows wherePopen.kill
is an alias forPopen.terminate
.)The use case with
pytest --pdb
is not covered here; Nox still insists on terminating the child. When I tried this, I did get to see pytest's traceback before Nox killed it, which is an improvement. One way to support this use case would be a command-line option to disable sending additional signals to the child. I don't think this needs to be addressed by this PR either. tox also does not support this currently (configuration option disabling sucicide on pytest subprocess tox-dev/tox#1791).When output is captured and the child handles the interrupt, we silently discard anything
Popen.communicate
buffered for us before it was interrupted. However, the only documented way to capture output fromsession.run
is thesilent
flag, and in that case it doesn't matter.