-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/exec: Start sometimes creates processes with signals blocked with go1.5 on OSX #13164
Comments
What does the loop that is running the program look like? Is this a loop written in Go or in shell or something else? How are you testing whether signals are blocked? |
It's a shell script that runs a "go test ..." in a loop. That said, we don't only see this in testing, we also see it on our production systems. Its a c++ application doing daemonizing double fork where the grandparent doesn't exit until the grandchild sends a signal to the grandparent indicating that it's ready to do work. We've tested that we can fix the problem by adding code to our c++ application to unblock signals before doing anything else, but this is not a whole solution for us because the c++ application is already widely distributed. |
I'm sorry, I don't grasp the details here, and the details matter. I see now that your original issue title refers to exec.Start, implying that you are using the os/exec package. You say that you have a shell script running "go test" in a loop. Is that Go program using os/exec? How exactly are you starting the program whose signals are blocked? What does the C++ daemonizing application have to do with this? Is that the test, or is that the production system? Which signal(s) exactly are you using? Do you have a small reproduction case that you could share? Thanks. |
We have an application written in go who's main responsibility is to run The c++ program needs to to be executed without signals blocked. Sometimes the go program executes the c++ program with it's signals blocked. We have tests for our go program, and some of them execute the c++ program. If we execute one of those tests over and over, via a bash while loop, we We need to send and receive SIGUSR2 Our attempts to create a small reproduction case have failed. Thanks. On Thu, Nov 5, 2015 at 5:29 PM, Ian Lance Taylor notifications@github.com
|
Thanks for the details. Does your Go program, the one that executes the C++ program, use cgo? |
Also, you said this was Darwin, but what is your GOARCH value? |
Yes, there is some cgo. GOARCH="amd64" |
I assume your cgo code never calls sigaction or sigprocmask. Sorry, I realized that may have misunderstood something. When you say that the signals are blocked, what precisely do you mean? Do you mean that the signal is in the blocked signal mask as set by sigprocmask or pthread_sigmask? Or do you mean that the signal is ignored because sigaction was used to set the signal handler to SIG_IGN? |
I assume that they're blocked, ala sigprocmask as opposed SIG_IGN'd. The piece of code that when inserted into the c++ code masks the problem is:
And, I don't think that unblocking a signal should unignore it, but I may be wrong. Also, we disabled the cgo function calls and re-ran our tests without any change in outcome. |
I agree that they must be blocked if calling sigprocmask fixes it. I asked about cgo because a program that uses cgo uses a different code path when creating a new thread. When creating a new thread it is necessary to temporarily block all signals, so that the new thread does not receive a signal before it has registered an alternate signal stack. However, the code uses pthread_sigmask, which only affects the signal mask for the calling thread. My assumption is that execve uses the signal mask of the thread that calls execve, or the process-wide signal mask. Anyhow, that is probably not the issue. Does your program use the os/signal package at all? |
I'm having similar problem: Linux/amd64, exec.Run() ( or Start() ) Go app starts C app (that kills others C app currently running). to make it easier to reproduce it: |
Thanks for the sample code. I suspect this is an unexpected consequence of https://golang.org/cl/10173. |
An interesting thing about that code is that the bash loop is key. It seams that most instances of app have no problems spawning processes and can run seemingly forever without creating a blocked-signal process, but some of the instances of app spawn off many blocked-signal subprocesses. You have to run app many times before you get a bad one. |
Hey @ianlancetaylor Thank you for working on this 😄. Would it be best to compile applications with Go 1.4 until this is fixed, or is there something I can do code-wise to mitigate this problem? (hashicorp/consul-template#442) |
I'm not sure I fully understand the problem, and I don't have any suggested mitigation within your Go code. Of course you could fix it by changing your C program to call sigprogmask, or by interposing a tiny C program that calls sigprocmask and then invokes the real C program. |
Sorry for the lack of context 😄. I have a program that allows the user to specify an arbitrary command to be run when data from an external system changes. In short, it does this: // Create an invoke the command
cmd := exec.Command(shell, flag, command)
cmd.Stdout = r.outStream
cmd.Stderr = r.errStream
cmd.Env = cmdEnv
if err := cmd.Start(); err != nil {
return err
}
done := make(chan error, 1)
go func() {
done <- cmd.Wait()
}()
// ... In this case, the user is calling |
@sethvargo I'm sorry, I understand the question, but I can only repeat my last response: I don't know, except that you can mitigate by using a tiny C helper. |
OK, I think I've got an idea as to what is happening. The call sequence is
This call sequence gives us a new m that has a signal mask based on the one in the locked M that called schedule. Normally a locked m has an empty signal mask just like every other M, but there is an exception: the goroutine started by ensureSigM. That goroutine, which is locked to a thread, has a signal mask that blocks everything except the signals passed to signal.Notify. So when that goroutine manages to enter schedule at a time when no other m is available, we can get an m with a non-empty signal mask. The os/exec package doesn't do anything about this, so the signal mask of the m that calls os/exec.Start will provide the signal mask of the process. I was able to occasionally recreate the problem with this trivial C program in ./foo5:
running this Go program that calls the C program:
|
CL https://golang.org/cl/18064 mentions this issue. |
@ianlancetaylor well spotted! 🏆 |
@ianlancetaylor This issue isn't resolved in 1.5.3 despite the milestone saying so and it seems like it affects Linux as well. 70c9a81 only hit the master branch. |
Thanks. I marked it 1.5.3 to mean that it should go into a 1.5.3 release if there was one, but then 1.5.3 became a security release with only security fixes. I've changed the milestone to 1.5.4. |
@ianlancetaylor Thanks! |
I have a similar issue with linux/go1.5.3:
Sometimes the bash process is normal (only ignores signal 17, 18, 23 and 28), but some other times it is set to ignore all this signals: 1-3, 6, 12-15, 17, 18, 20-26 and 28-64. |
@clementino This issue is closed. The problem should be fixed in the 1.6 release. If not, please open a new issue. If you want to discuss this problem or ask questions, please see https://golang.org/wiki/Questions . Thanks. |
I can recreate this by running a test of our application in a loop. Eventually a process is started that has many of its signals blocked. This ends up causing a error because we're running a process that expects to be able to receive signals.
My attempts to create a smaller program that reproduces this has only failed. I would assume that some bit of the complexity of the application is triggering the bug.
I'll be happy to provide any information that I can.
Gabriel Russell
The text was updated successfully, but these errors were encountered: