Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: Transport returning "http2: no cached connection was available" #16582

Closed
russ- opened this issue Aug 3, 2016 · 3 comments
Closed
Milestone

Comments

@russ-
Copy link

russ- commented Aug 3, 2016

  1. What version of Go are you using (go version)?
    go version go1.7rc3 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/rra/dev"
    GORACE=""
    GOROOT="/Users/rra/opt/homebrew/Cellar/go/1.7rc3/libexec"
    GOTOOLDIR="/Users/rra/opt/homebrew/Cellar/go/1.7rc3/libexec/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/00/1m160000h01000cxqpysvccm006h4r/T/go-build956765444=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    I'm seeing "http2: no cached connection was available" failures from http.Client when I start many (~hundreds) requests at once.

Run https://play.golang.org/p/zKthX7Y9RW with GODEBUG=http2client=0 and GODEBUG=http2client=1 to see the behavior difference.

  1. What did you expect to see?
    With http2 enabled, I expect to see requests queue up and eventually succeed.
  2. What did you see instead?
    Some requests fail with "http2: no cached connection was available".

See also https://groups.google.com/d/msg/golang-nuts/aw0vboyE1pA/DkOdf_5tDgAJ

@bradfitz
Copy link
Contributor

bradfitz commented Aug 3, 2016

Even with 500 parallel requests (using your code exactly), I find it hard to reproduce this. I did see it, though.

I see what's happening, though.

After a connection negotiates ALPN protocol "h2", the upgrade function runs,

https://github.com/golang/net/blob/e2ba55e4e78399d85f2a0e0b92396b81ed410633/http2/configure_transport.go#L34

which returns a RoundTripper to use for that request, and also gets a new connection (c *tls.Conn) to use if it wants to:

    upgradeFn := func(authority string, c *tls.Conn) http.RoundTripper {
        addr := authorityAddr("https", authority)
        if used, err := connPool.addConnIfNeeded(addr, t2, c); err != nil {
            go c.Close()
            return erringRoundTripper{err}
        } else if !used {
            // Turns out we don't need this c.
            // For example, two goroutines made requests to the same host
            // at the same time, both kicking off TCP dials. (since protocol
            // was unknown)
            go c.Close()
        }
        return t2
    }

The error message you're seeing is coming from t2 at the bottom there, which is configured to return the error if it doesn't have a usable connection.

The problem is that in addConnIfNeeded,

https://github.com/golang/net/blob/e2ba55e4e78399d85f2a0e0b92396b81ed410633/http2/client_conn_pool.go#L127

It only makes it this far:

func (p *clientConnPool) addConnIfNeeded(key string, t *Transport, c *tls.Conn) (used bool, err error) {
    p.mu.Lock()
    for _, cc := range p.conns[key] {
        if cc.CanTakeNewRequest() {
            p.mu.Unlock()
            return false, nil
        }
    }

It sees that it has a connection and decides to use it, figuring that it'll be usable immediately.

But CanTakeNewRequest accounts for MAX_CONCURRENT_STREAMS:

https://github.com/golang/net/blob/e2ba55e4e78399d85f2a0e0b92396b81ed410633/http2/transport.go#L496

And google.com allows 100 concurrent streams:

$ h2i www.google.com
Connecting to www.google.com:443 ...
Connected to [2607:f8b0:4005:805::2004]:443
[FrameHeader SETTINGS len=24]
  [MAX_CONCURRENT_STREAMS = 100]
  [INITIAL_WINDOW_SIZE = 1048576]
  [MAX_FRAME_SIZE = 16384]
  [MAX_HEADER_LIST_SIZE = 16384]
[FrameHeader WINDOW_UPDATE len=4]
  Window-Increment = 983041

h2i> 

So if you have over 100 goroutines in that codepath at once, all deciding they'll "soon" have an available connection, and without doing any dead-reckoning that they'll soon be taking one of those available slots, then Transport.CanTakeNewRequest returns false later when it's time to actually send a request, the http2.Transport tries to get a new connection, but its connection pool is configured to return ErrNoCachedConn thinking that return value will only be used by the upgrade function to decide whether to use keep the new TLS connection or not.

This is basically #13957 and #13774

@bradfitz bradfitz changed the title x/net/http2: http.Client returning "http2: no cached connection was available" x/net/http2: Transport returning "http2: no cached connection was available" Aug 3, 2016
@bradfitz bradfitz self-assigned this Aug 3, 2016
@bradfitz bradfitz added this to the Go1.8 milestone Aug 3, 2016
gopherbot pushed a commit that referenced this issue Aug 8, 2016
…y limit

The Go HTTP/1 client will make as many new TCP connections as the user requests.

The HTTP/2 client tried to have that behavior, but the policy of
whether a connection is re-usable didn't take into account the extra 1
stream counting against SETTINGS_MAX_CONCURRENT_STREAMS so in practice
users were getting errors.

For example, if the server's advertised max concurrent streams is 100
and 200 concurrrent Go HTTP requests ask for a connection at once, all
200 will think they can reuse that TCP connection, but then 100 will
fail later when the number of concurrent streams exceeds 100.

Instead, recognize the "no cached connections" error value in the
shouldRetryRequest method, so those 100 will retry a new connection.

This is the conservative fix for Go 1.7 so users don't get errors, and
to match the HTTP/1 behavior. Issues #13957 and #13774 are the more
involved bugs for Go 1.8.

Updates #16582
Updates #13957

Change-Id: I1f15a7ce60c07a4baebca87675836d6fe03993e8
Reviewed-on: https://go-review.googlesource.com/25580
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
madeye pushed a commit to shadowsocks/go that referenced this issue Aug 10, 2016
…y limit

The Go HTTP/1 client will make as many new TCP connections as the user requests.

The HTTP/2 client tried to have that behavior, but the policy of
whether a connection is re-usable didn't take into account the extra 1
stream counting against SETTINGS_MAX_CONCURRENT_STREAMS so in practice
users were getting errors.

For example, if the server's advertised max concurrent streams is 100
and 200 concurrrent Go HTTP requests ask for a connection at once, all
200 will think they can reuse that TCP connection, but then 100 will
fail later when the number of concurrent streams exceeds 100.

Instead, recognize the "no cached connections" error value in the
shouldRetryRequest method, so those 100 will retry a new connection.

This is the conservative fix for Go 1.7 so users don't get errors, and
to match the HTTP/1 behavior. Issues golang#13957 and golang#13774 are the more
involved bugs for Go 1.8.

Updates golang#16582
Updates golang#13957

Change-Id: I1f15a7ce60c07a4baebca87675836d6fe03993e8
Reviewed-on: https://go-review.googlesource.com/25580
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz
Copy link
Contributor

I think 7a62274 was sufficient to close this bug.

Follow-up work can be done in #13957 and #13774

@WillSewell
Copy link

I am still receiving this error on Go 1.9 when performing a large number of concurrent requests (400). Looking at the http2 verbose logs, I can see a large number of http2: no cached connection was available, but only a small % of these bubble up to http.Client.Do. I was looking through the source code to see why these were not being caught by the fix in 7a62274.

The only other place I can see RoundTripper.RoundTrip being called where http2ErrNoCachedConn is not being handled is on this line: https://github.com/golang/go/blob/master/src/net/http/transport.go#L371. I believe the concrete implementation here is noDialH2RoundTripper since we are configuring the client with https://github.com/golang/net/blob/master/http2/transport.go#L125. So my suspicion is this is where the error is being returned from. Is there a reason there is no retrying performed after a error on https://github.com/golang/go/blob/master/src/net/http/transport.go#L371?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants