-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid double sending of SIGINT to subprocesses #889
Conversation
Standard bash behavior when pressing Ctrl+C is to send SIGINT to the foreground process group. This combines well with standard fork behavior because subprocesses are started in the same group, so a multi-process program is correctly terminated without any explicit signal handling. It however does not work well in cases where signals are handled explicitly like here, because on Ctrl+C the subprocesses receive two SIGINTs - one directly from the shell and one from the interrupt handler in the `CLI` class. This in turn can prevent graceful shutdown in the subprocesses. The specific example that prompted this fix was interrupting feature specs using selenium. In case of a double interrupt the browser process will sometimes remain running after rspec is terminated.
a19e97f
to
b4c238b
Compare
thx, sounds like a good change overall ... a little scared that will trigger other edge-cases, but worth a try! :) |
4.1.0 |
Thanks for the quick response. We've been running with this change internally for about a month, so it's moderately well tested, but we definitely do not cover all possible shell and os combinations :) |
I think I've managed to find one of those edge cases, but I'm struggling to narrow it down. Our parallel tests have started hanging forever if they call I've got some code that spawns a subprocess and loops while waiting for it to finish - it boils down to something like: pid = spawn("some_command")
loop do
_child_pid, status = Process.waitpid2(pid, Process::WNOHANG)
if status
puts "OK we're done #{status.inspect}"
break
else
puts "sleeeeepy"
sleep 0.1
puts "OK time to do some work"
end
end With parallel_tests 4.1.0, the specs that touch this code hang forever when they hit that However, just that code on its own doesn't seem enough to trigger it - only in the context of our much larger test suite. I'll carry on trying to narrow it down tomorrow, but I've already lost a couple of hours to this and starting to lose the will to live... 😭 I'm seeing this on ruby 3.1.2p20 with an M1 running macOS 13.1. Appreciate this probably sounds crazy, but posting here in the hope that one of you could see some obvious reason this might fail. |
yeah issues like this suck :( |
So I think the problem I was seeing is my fault, but it's an interesting edge case at least... RSpec.describe "WTF" do
specify do
puts "#{$$}/#{Process.getpgid($$)} #{$0} start spec"
src = "./audio.mp3"
dst = "./audio.aiff"
args = %W[ffmpeg -i #{src} -y -f aiff #{dst}]
pid = spawn(*args)
puts "waitpid(#{pid})"
Process.waitpid(pid)
expect($?).to be_success
end
end Turns out that ffmpeg allows you to hit 'q' while it's in the middle of transcoding and it'll abort. When my rspec process is the process group leader, that results in the rspec process receiving a TTOU signal, and then it blocks forever - either in waitpid like above, or in the Being able to interactively interrupt ffmpeg isn't really desirable behaviour for my usecase, so for me this is fixed by adding |
@jdelStrother according to the man page the default behaviour for |
@grosser if you prefer I should be able to put together a minimal example that shows that the test process incorrectly receives SIGINT twice when manually interrupted from the command line. Then you can treat it as a bug report and maybe find a different way to fix it. |
a minimal test that fails without the change would be nice so we have something that breaks if it ever gets reverted :) this TTOU signal seems super sus, but at least the fix is easy, so let's keep this change for now and see if we see more bug reports 🤞 |
Yep. I'm still trying to wrap my head around it. I think the answer is something along the lines of:
In 4.0.0, it's not considered a background process, so there's no TTOUs, so no halting. I'm a little confused about whether rspec is receiving the TTOU directly, or whether the ffmpeg process receives TTOU and that's propagated to the parent somehow. |
I'm having a little trouble writing a spec for this because the problem is actually triggered by the behaviour of Ctrl+C in bash/zsh. You should however be able to reproduce the double SIGINT problem with the following script: # dummy_spec.rb
count = 0
Signal.trap(:INT) do
puts 'Received SIGINT'
count += 1
exit if count == 2
end
sleep 10 Then what happens when I run it in the shell and press Ctrl+C is the following:
The reason I can't easily convert this to a spec is that in the integration tests there is no shell process and the problem is actually triggered by the fact that Ctrl+C in the shell sends a SIGINT to each process of the foreground group. So to write a spec I'll need to wrap the test runner in an extra shell process and I've yet to find a sensible way to do that. |
possible fix #891 |
I was unable to reproduce any way of making it not send the kill to all children :/ |
... got it working now I think |
I can confirm that #891 solves the problem for me. I'll just point out that the documentation states that |
yeah good point, will add |
This addresses an issue when manually interrupting the test suite via Ctrl+C in the shell. In particular we've noticed that this can sometimes leave orphaned web browser processes from our selenium feature specs, which then cause various problems until manually terminated.
Tracing the problem I found that it's caused by subprocesses receiving SIGINT twice thus preventing the graceful shutdown that would normally happen. The fix prevents the shell from sending a SIGINT directly to the subprocesses leaving only one SIGINT from the interrupt handler in the
CLI
class.