Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: under edge conditions, select fails to read from time.After() channel #14561

Closed
bobziuchkovski opened this issue Feb 29, 2016 · 6 comments
Milestone

Comments

@bobziuchkovski
Copy link

What version of Go are you using (go version)?

Tested on go1.6 darwin/amd64 and go1.6 linux/amd64

What operating system and processor architecture are you using (go env)?

Mac:

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/bobbyz/projects/community/go:/Users/bobbyz/projects/bztech/go:/Users/bobbyz/projects/bobbyz/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GO15VENDOREXPERIMENT="1"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
CXX="clang++"
CGO_ENABLED="1"

Linux:

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bobbyz/gopath/"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"

What did you do?

// This is an exact, unaltered copy-paste of the code I'm running.  I trimmed my live code
// down to just this function body in order to test/reproduce
func (l *logger) close(timeout time.Duration) error {
    fmt.Println("before select")
    select {
    case <-time.After(time.Second):
        fmt.Println("timeout occurred")
        return errors.New("timeout")
    }
}

What did you expect to see?

I should see "before select" printed. Then after 1 second, I should see "timeout occurred" printed.

What did you see instead?

I see "before select" printed. Nothing else prints and the program never exits. This is definitely the only function in my code that prints "before select", so the close method is called, but the channel read from the time.After channel never happens. I've noticed the program spins my CPU at 100% while I'm waiting for the timeout that never occurs.

Other

I can only reproduce this when I add a panic() to one of my functions and recover from the panic. The "close" method shown above runs on program termination, after the panic recovery. If I modify the panicking code to return nil in place of the panic, I can't reproduce. That portion of code is outside of this close method. I also can't reproduce if I remove the select and read directly from the channel.

I ripped apart my code trying to find the cause of the hanging select. I uploaded a functioning but mangled/stripped copy of the code to https://github.com/bobziuchkovski/golang-select-repro. The code itself hardly makes sense anymore, but it runs and reproduces the hanging select about 1 in 8 times on my laptop.

The select in question is in logger.go on line 110. The panic is triggered in main.go on line 45. If I comment-out that panic, I can't reproduce the hanging select. The close method with the select in question is triggered indirectly via line 40 of main.go

@ianlancetaylor ianlancetaylor changed the title Under edge conditions, select fails to read from time.After() channel runtime: under edge conditions, select fails to read from time.After() channel Feb 29, 2016
@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Feb 29, 2016
@bobziuchkovski
Copy link
Author

I have stripped as much as I can out of the codebase and posted it to a new repo: https://github.com/bobziuchkovski/golang-select-repro

The code itself is now basically gibberish, but it does reproduce the issue. On my laptop, I get the hanging select about 1 out of every 8 tries now.

I'm not sure if it's related or not, but I can also trigger an internal compiler error, ./main.go:33: internal compiler error: large ONEW with EscNone: new(), if I change lines 33 and 34 of main.go from

hdr := NewBuffer()
hdr.WriteString("blah")

to

NewBuffer()

@bobziuchkovski
Copy link
Author

I updated the issue description with a bit more info on where the panic is triggered, where the hanging select is triggered, etc. In short:

The select in question is in logger.go on line 110. The panic is triggered in main.go on line 45. If I comment-out that panic, I can't reproduce the hanging select. The close method with the select in question is triggered indirectly via line 40 of main.go.

@cespare
Copy link
Contributor

cespare commented Feb 29, 2016

Hey @bobziuchkovski, would it be possible for you to try to (separately from this issue) minimize the repro for the internal compiler error and open that as a separate issue?

@bobziuchkovski
Copy link
Author

@cespare Sure thing. I trimmed it down to an error that is reproducible on the Go Playground. I opened the separate compiler issue at #14574

@mdempsky
Copy link
Contributor

mdempsky commented May 19, 2016

I think this is #10958 (or one of the many similar "infinite loops are bad" known issues). Namely, your (*fieldList).NumFields method has an infinite loop if p != nil && p.parent != nil.

FYI, here's how I diagnosed this:

  1. Run your test program until it hung for more than a second.
  2. Use ps -L -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm to identify the thread that's currently running.
  3. Use kill -QUIT $threadid to send SIGQUIT to that thread. (Sending it to the process generally ends up being received by an idle thread, which precludes getting a backtrace for the running thread.)
  4. Repeat a few times and notice that main.(*fieldList).NumFields is always at the top of the backtrace.

After changing current = p.parent to current = current.parent, the program ran successfully 100 times in a row.

@bobziuchkovski
Copy link
Author

@mdempsky Thank you so much for the time you put into debugging that! I was just reviewing the live project code and I fixed the infinite loop some time ago, but didn't realize that was the cause of the select race. I'm sorry you ended up wasting time on a duplicate bug, but I really appreciate both the time you spent as well as the diagnostic steps you shared above. That is an incredibly useful diagnostic procedure. Thanks again!

@golang golang locked and limited conversation to collaborators May 20, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants