-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osproc.terminate broken on posix (when using fork at least) #1558
Comments
I think the label should be High Priority. How should the issue be solved?
|
Impact of this bug in Aporia: When you click "Terminate running process", Aporia will be terminated, but not the running process. |
"-p.id" is also used in suspend() and resume(). |
|
As suggested, repeating comment from #1590 here: Is it worth having a "stronger" function, Also, you might not always want to send a sigterm to the whole group. What if the process takes care of sending a signal to its own children? The python implementation of |
Please give comments to "Possible solution number 2". |
Should kill peek the exit code (after sending the signal) to ensure it also cleans up a zombie? Here is a possible waitForExit which allows you to reap a process after calling terminate (requires: from times import epochTime): proc waitForExit(p: Process, timeout: int = -1): int =
#if waitPid(p.id, p.exitCode, 0) == int(p.id):
# ``waitPid`` fails if the process is not running anymore. But then
# ``running`` probably set ``p.exitCode`` for us. Since ``p.exitCode`` is
# initialized with -3, wrong success exit codes are prevented.
if p.exitCode != -3: return p.exitCode
let start = epochTime()
var ret = 0
while ret == 0:
ret = waitpid(p.id, p.exitCode, WNOHANG)
let now = epochTime()
if timeout >= 0 and (now - start) * 1000 >= timeout.float:
break
# sleep for 50 milliseconds
if usleep(50*1000) != 0'i32:
raiseOSError(osLastError())
if ret == 0:
# indicates timeout, windows code doesn't signal timeout in anyway, so we
# don't for now
discard
elif ret < 0:
p.exitCode = -3
raiseOSError(osLastError())
result = int(p.exitCode) shr 8
There is a way to declare you aren't interested in the process' return code, see here: http://www.win.tue.nl/~aeb/linux/lk/lk-5.html#ss5.5 , but apparently, this has compatibility issues with BSD / sysvinit. Another solution would be to define your own SIGCHLD handler, see: http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html , but you would have to peekExitCode for all processes (as if two exit in quick succession you may only receive one SIGCHLD). |
How to reproduce this issue:
Result: Expected result: |
Issue can be closed. |
The following piece of code, from osproc.nim, results in signals being sent to invalid process GIDs when used in conjunction with osproc.startProcess:
Essentially, calling kill with a negative PID sends a signal to the process group with the PGID = |-p.id|. This can only be valid when p.id is the PID of a process group leader. This can never be true for a process started with osproc.startProcess as there is no call to setpgid, which could make the child a process group leader.
I believe sending a signal to the whole group is also inconsistent with the windows implementation. Furthermore, it is unreliable, as children can escape the process group with calls to setpgid. I think the only reliable way to achieve this would be to use: Linux-cgroups ,Windows-jobs, and equivalents, which prevent children escaping the process group.
Another issue is that, as a SIGKILL is sent almost immediately after sending a SIGTERM, you aren't giving the process the opportunity to shut down cleanly. I believe calls to osproc.terminate should be paired with calls to waitForExit to give it time to shut down. This also cleans up zombie processes, which don't receive signals.
waitForExit should also respect the timeout parameter on posix by polling with the option: WNOHANG
The text was updated successfully, but these errors were encountered: