Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: Start sometimes creates processes with signals blocked with go1.5 on OSX #13164

Closed
gabrielrussell opened this issue Nov 5, 2015 · 25 comments

Comments

@gabrielrussell
Copy link

I can recreate this by running a test of our application in a loop. Eventually a process is started that has many of its signals blocked. This ends up causing a error because we're running a process that expects to be able to receive signals.

My attempts to create a smaller program that reproduces this has only failed. I would assume that some bit of the complexity of the application is triggering the bug.

I'll be happy to provide any information that I can.

Gabriel Russell

@rakyll rakyll changed the title with go1.5 on OSX, exec.Start() sometimes creates processes with signals blocked. os/exec: Start sometimes creates processes with signals blocked with go1.5 on OSX Nov 5, 2015
@ianlancetaylor
Copy link
Contributor

What does the loop that is running the program look like? Is this a loop written in Go or in shell or something else?

How are you testing whether signals are blocked?

@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Nov 5, 2015
@gabrielrussell
Copy link
Author

It's a shell script that runs a "go test ..." in a loop. That said, we don't only see this in testing, we also see it on our production systems.

Its a c++ application doing daemonizing double fork where the grandparent doesn't exit until the grandchild sends a signal to the grandparent indicating that it's ready to do work.

We've tested that we can fix the problem by adding code to our c++ application to unblock signals before doing anything else, but this is not a whole solution for us because the c++ application is already widely distributed.

@ianlancetaylor
Copy link
Contributor

I'm sorry, I don't grasp the details here, and the details matter. I see now that your original issue title refers to exec.Start, implying that you are using the os/exec package. You say that you have a shell script running "go test" in a loop.

Is that Go program using os/exec?

How exactly are you starting the program whose signals are blocked?

What does the C++ daemonizing application have to do with this? Is that the test, or is that the production system?

Which signal(s) exactly are you using?

Do you have a small reproduction case that you could share?

Thanks.

@gabrielrussell
Copy link
Author

We have an application written in go who's main responsibility is to run
another program that's written in c++.

The c++ program needs to to be executed without signals blocked.

Sometimes the go program executes the c++ program with it's signals blocked.

We have tests for our go program, and some of them execute the c++ program.

If we execute one of those tests over and over, via a bash while loop, we
eventually see the c++ program get executed with signals blocked.

We need to send and receive SIGUSR2

Our attempts to create a small reproduction case have failed.

Thanks.

On Thu, Nov 5, 2015 at 5:29 PM, Ian Lance Taylor notifications@github.com
wrote:

I'm sorry, I don't grasp the details here, and the details matter. I see
now that your original issue title refers to exec.Start, implying that you
are using the os/exec package. You say that you have a shell script running
"go test" in a loop.

Is that Go program using os/exec?

How exactly are you starting the program whose signals are blocked?

What does the C++ daemonizing application have to do with this? Is that
the test, or is that the production system?

Which signal(s) exactly are you using?

Do you have a small reproduction case that you could share?

Thanks.


Reply to this email directly or view it on GitHub
#13164 (comment).

@ianlancetaylor
Copy link
Contributor

Thanks for the details. Does your Go program, the one that executes the C++ program, use cgo?

@ianlancetaylor
Copy link
Contributor

Also, you said this was Darwin, but what is your GOARCH value?

@gabrielrussell
Copy link
Author

Yes, there is some cgo.

GOARCH="amd64"

@ianlancetaylor
Copy link
Contributor

I assume your cgo code never calls sigaction or sigprocmask.

Sorry, I realized that may have misunderstood something. When you say that the signals are blocked, what precisely do you mean? Do you mean that the signal is in the blocked signal mask as set by sigprocmask or pthread_sigmask? Or do you mean that the signal is ignored because sigaction was used to set the signal handler to SIG_IGN?

@gabrielrussell
Copy link
Author

I assume that they're blocked, ala sigprocmask as opposed SIG_IGN'd.

The piece of code that when inserted into the c++ code masks the problem is:

sigset_t unblockSignalMask;
sigemptyset(&unblockSignalMask);
sigprocmask(SIG_SETMASK, &unblockSignalMask, NULL);

And, I don't think that unblocking a signal should unignore it, but I may be wrong.

Also, we disabled the cgo function calls and re-ran our tests without any change in outcome.

@ianlancetaylor
Copy link
Contributor

I agree that they must be blocked if calling sigprocmask fixes it.

I asked about cgo because a program that uses cgo uses a different code path when creating a new thread. When creating a new thread it is necessary to temporarily block all signals, so that the new thread does not receive a signal before it has registered an alternate signal stack. However, the code uses pthread_sigmask, which only affects the signal mask for the calling thread. My assumption is that execve uses the signal mask of the thread that calls execve, or the process-wide signal mask. Anyhow, that is probably not the issue.

Does your program use the os/signal package at all?

@sielaq
Copy link

sielaq commented Nov 7, 2015

I'm having similar problem: Linux/amd64, exec.Run() ( or Start() )
I have created 2 app to recreate this:
Go:
https://gist.github.com/sielaq/5120a4f06cefc2138ed9
C:
https://gist.github.com/sielaq/9867be2d1b65ffec658c

Go app starts C app (that kills others C app currently running).
When app compiled with go 1.4 -> all C app receive SIGTERM from each other
When app compiled with go 1.5 -> sometimes C app cannot trap SIGTERM

to make it easier to reproduce it:
bash -c "while true; do ./app; done"
try to break it ^C multiple times -> as an effect you should see that C apps start to cumulate in process list.

@ianlancetaylor
Copy link
Contributor

Thanks for the sample code. I suspect this is an unexpected consequence of https://golang.org/cl/10173.

@gabrielrussell
Copy link
Author

An interesting thing about that code is that the bash loop is key. It seams that most instances of app have no problems spawning processes and can run seemingly forever without creating a blocked-signal process, but some of the instances of app spawn off many blocked-signal subprocesses.

You have to run app many times before you get a bad one.

@ianlancetaylor ianlancetaylor self-assigned this Nov 11, 2015
@sethvargo
Copy link
Contributor

Hey @ianlancetaylor

Thank you for working on this 😄. Would it be best to compile applications with Go 1.4 until this is fixed, or is there something I can do code-wise to mitigate this problem? (hashicorp/consul-template#442)

@ianlancetaylor
Copy link
Contributor

I'm not sure I fully understand the problem, and I don't have any suggested mitigation within your Go code.

Of course you could fix it by changing your C program to call sigprogmask, or by interposing a tiny C program that calls sigprocmask and then invokes the real C program.

@sethvargo
Copy link
Contributor

Hi @ianlancetaylor

Sorry for the lack of context 😄. I have a program that allows the user to specify an arbitrary command to be run when data from an external system changes. In short, it does this:

// Create an invoke the command
cmd := exec.Command(shell, flag, command)
cmd.Stdout = r.outStream
cmd.Stderr = r.errStream
cmd.Env = cmdEnv
if err := cmd.Start(); err != nil {
  return err
}

done := make(chan error, 1)
go func() {
  done <- cmd.Wait()
}()

// ...

https://github.com/hashicorp/consul-template/blob/8a8a9301631244754792e5bf979d28ec6e3a9900/runner.go#L600-L612

In this case, the user is calling haproxy. This worked fine in Go 1.4, but is broken in Go 1.5. I'm just wondering if there's something I can change in my code to work around this behavior instead of telling users to refactor the arbitrary scripts they run.

@ianlancetaylor
Copy link
Contributor

@sethvargo I'm sorry, I understand the question, but I can only repeat my last response: I don't know, except that you can mitigate by using a tiny C helper.

@ianlancetaylor
Copy link
Contributor

OK, I think I've got an idea as to what is happening.

The call sequence is

schedule: called by locked m
stoplockedm
handoffp
startm
newm: allocates m, calls msigsave to save signal mask into m, then calls
newosproc: starts new thread running
mstart
mstart1
minit: sets signal mask based on mask saved by newm

This call sequence gives us a new m that has a signal mask based on the one in the locked M that called schedule. Normally a locked m has an empty signal mask just like every other M, but there is an exception: the goroutine started by ensureSigM. That goroutine, which is locked to a thread, has a signal mask that blocks everything except the signals passed to signal.Notify. So when that goroutine manages to enter schedule at a time when no other m is available, we can get an m with a non-empty signal mask.

The os/exec package doesn't do anything about this, so the signal mask of the m that calls os/exec.Start will provide the signal mask of the process.

I was able to occasionally recreate the problem with this trivial C program in ./foo5:

#include <stdio.h>
#include <signal.h>

int main() {
    sigset_t mask;

    sigemptyset(&mask);
    if (sigprocmask(SIG_SETMASK, NULL, &mask) < 0) {
        perror("sigprocmask");
        return 1;
    }
    if (sigismember(&mask, SIGINT)) {
        fprintf(stderr, "SIGINT is blocked\n");
        return 1;
    }
    return 0;
}

running this Go program that calls the C program:

package main

import (
    "fmt"
    "os"
    "os/exec"
    "os/signal"
    "syscall"
    "sync"
)

func main() {
    c := make(chan os.Signal, 1)
    signal.Notify(c, syscall.SIGTERM)
    go func() {
        for range c {
            fmt.Println("SIGTERM")
        }
    }()

    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < 1000; j++ {
                syscall.Kill(os.Getpid(), syscall.SIGTERM)
                go func(iter int) {
                    cmd := exec.Command("./foo5")
                    cmd.Stdout = os.Stdout
                    cmd.Stderr = os.Stderr
                    if err := cmd.Run(); err != nil {
                        fmt.Printf("iteration %d: %v\n", iter, err)
                        os.Exit(1)
                    }
                }(j)
            }
        }()
    }
    wg.Wait()
}

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/18064 mentions this issue.

@sielaq
Copy link

sielaq commented Dec 19, 2015

@ianlancetaylor well spotted! 🏆

@ianlancetaylor ianlancetaylor modified the milestones: Go1.5.3, Go1.6 Dec 21, 2015
@viciious
Copy link

@ianlancetaylor This issue isn't resolved in 1.5.3 despite the milestone saying so and it seems like it affects Linux as well. 70c9a81 only hit the master branch.

@ianlancetaylor ianlancetaylor added this to the Go1.5.4 milestone Jan 14, 2016
@ianlancetaylor ianlancetaylor removed this from the Go1.5.3 milestone Jan 14, 2016
@ianlancetaylor
Copy link
Contributor

Thanks. I marked it 1.5.3 to mean that it should go into a 1.5.3 release if there was one, but then 1.5.3 became a security release with only security fixes. I've changed the milestone to 1.5.4.

@viciious
Copy link

@ianlancetaylor Thanks!

@clementino
Copy link

I have a similar issue with linux/go1.5.3:

package main

import (
    "os"
    "os/exec"
    "os/signal"
    "syscall"
)

func main() {
     signal.Notify(make(chan os.Signal, 1), syscall.SIGUSR1)
     signal.Reset()
     cmd := exec.Command("bash")
     cmd.Stdin = os.Stdin
     cmd.Stdout = os.Stdout
     cmd.Stderr = os.Stderr
     cmd.Run()
}

Sometimes the bash process is normal (only ignores signal 17, 18, 23 and 28), but some other times it is set to ignore all this signals: 1-3, 6, 12-15, 17, 18, 20-26 and 28-64.
If I just import _ "os/signal" without calling a Notify it doesn't happen, but any Notify seems to make it happen, even if I Reset afterward.
The frequency varies between my two machines, about half of the times on one and much less frequently on the other.

@ianlancetaylor
Copy link
Contributor

@clementino This issue is closed. The problem should be fixed in the 1.6 release. If not, please open a new issue. If you want to discuss this problem or ask questions, please see https://golang.org/wiki/Questions . Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants