-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drain Response.Body to enable TCP/TLS connection reuse (4x speedup) #317
Conversation
On successful queries (on errors ReadAll runs in CheckResponse), the Response.Body is not read all the way to the EOF as json.Decoder is used, which stops at the end of the object, which is probably followed by a \n. Then, Response.Body.Close() is called. When that happens with a non-drained Body, the TCP/TLS connection is closed (which makes sense, in case there was still a lot to read from the network), stopping the Transport from reusing it. Also, a library user can't drain the Body themselves because Close() is called before passing the Response to the user. Since we know GitHub API responses only carry a single JSON object, draining before closing is free, and saving all those handshakes causes huge savings in latency, bandwidth and CPU for both the client and the poor GitHub servers. Here's the result of running the benchmark below on a ADSL connection: before: 2m3.001599093s after: 33.753543928s func main() { ts := oauth2.StaticTokenSource( &oauth2.Token{AccessToken: os.Getenv("GITHUB_TOKEN")}, ) tc := oauth2.NewClient(oauth2.NoContext, ts) gh := github.NewClient(tc) start := time.Now() for i := 0; i < 100; i++ { gh.Repositories.Get("FiloSottile", "gvt") fmt.Print(".") } fmt.Printf("\n%s\n", time.Now().Sub(start)) }
Hey Filippo, this is really cool! I have one question, I'll post it inline. |
defer resp.Body.Close() | ||
defer func() { | ||
// Drain and close the body to let the Transport reuse the connection | ||
io.Copy(ioutil.Discard, resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On successful queries (on errors ReadAll runs in CheckResponse), the
Response.Body is not read all the way to the EOF as json.Decoder is used,
which stops at the end of the object, which is probably followed by a \n.Since we know GitHub API responses only carry a single JSON object, draining
before closing is free
I'm just curious, but have you tried doing something like n, _ := io.Copy(ioutil.Discard, resp.Body)
here and dump the value of n
? I'm curious if it's indeed 1 and contains a '\n'
, or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Just tried and it turns out it's zero!
Maybe explicitly hitting io.EOF
flips a flag somewhere, this looks like something the runtime could improve on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for checking!
I think that's probably normal, and not something runtime could do better.
In a general case, readers deal with streams where the EOF may not be known in advance.
The read method is documented to either return nil error first time and 0, EOF the next, but it's also valid to return EOF right away. It's up to the implementation which approach to use.
When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call. An instance of this general case is that a Reader returning a non-zero number of bytes at the end of the input stream may return either err == EOF or err == nil. The next Read should return 0, EOF.
When reading from a network stream or some other unknown source where you can't cheat and look ahead, you may not know it's EOF until you actively try to read more bytes.
Also see this snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly, I just tried this, and it's apparently discarding 0 bytes, yet I'm still seeing the same speed improvements @FiloSottile is reporting. Maybe it's just the io.EOF that is left to be read which is preventing the transport from being reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea of runtime improvement would be a Read of, say, 512 bytes with a 50ms timeout happening on close, which would probably be good enough for cases like this one. But I'm getting OT, I'll bring this to golang/go :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
psh. stupid GitHub not updating this comment thread so I didn't see your identical comments :)
Maybe it's a good idea to add a test for this? |
I don't have time to write one today (1:30am 😲), but I think one can be written like this
But it might be flappy, as no behavior is guaranteed here AFAICT. Can't think of a simpler one. |
I don't think it's necessary to have a test for this, unless someone can come up with a reliable one. This is just normal behavior, and it's explained well in the comment in code. This change is not about correctness (the current code is also correct), it's about an optimization. |
Thanks for the patch @FiloSottile! |
Why not just mocking with an httptest server any of the github endpoints and use go-github to make the request?. I'm not familiar with the library (I came here through the gophers slack) but I believe it would be fairy simple to do this and assert that the connection is being reused (I've done this in the past with some other projects. ) Thoughts? |
Hi, author of The underlying reader (the one beneath the oauth2.Transport's onEOFReader) is returning (83, io.EOF), which is successfully recycling the TCP connection, even without your patch. Are your measurements actually from that code, or similar-ish code? Which version of Go? |
@bradfitz, I tried to reproduce this just now, and I was able to reliably see a significant performance difference. Exact program that I ran: package main
import (
"fmt"
"log"
"os"
"time"
"github.com/google/go-github/github"
"golang.org/x/oauth2"
)
func main() {
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: os.Getenv("GITHUB_TOKEN")},
)
tc := oauth2.NewClient(oauth2.NoContext, ts)
gh := github.NewClient(tc)
start := time.Now()
for i := 0; i < 100; i++ {
_, _, err := gh.Repositories.Get("FiloSottile", "gvt")
if err != nil {
log.Fatalln(err)
}
fmt.Print(".")
}
fmt.Printf("\n%s\n", time.Now().Sub(start))
} I'm on
Here are 4 runs with this change and 4 runs without, interleaved.
The only difference between the alternate runs are whether The fast runs (14-18 seconds) are with this change applied, and the slow runs (42-73 seconds) were with AppendixI've also tried adding error checking to the draining and body closing calls, just to see if there were any unexpected values there. There were no panics with either of: defer func() {
// Drain and close the body to let the Transport reuse the connection
n, err := io.Copy(ioutil.Discard, resp.Body)
if err != nil {
panic(err)
}
if n != 0 {
panic(fmt.Errorf("unexpected n: %v", n))
}
err = resp.Body.Close()
if err != nil {
panic(err)
}
}() Nor: defer func() {
// Drain and close the body to let the Transport reuse the connection
/*n, err := io.Copy(ioutil.Discard, resp.Body)
if err != nil {
panic(err)
}
if n != 0 {
panic(fmt.Errorf("unexpected n: %v", n))
}*/
err := resp.Body.Close()
if err != nil {
panic(err)
}
}() The performance difference remained as before. |
Any chance you could try that experiment using Go tip? Brad was testing with (I believe) a version pretty close to tip. I wonder if there's something that changed in net/http since go1.6. |
Sure, I can try with tip. It'll be a few moments because I'll have to get it and build it. |
Actually, never mind. I think I can reproduce it now at tip, on darwin/amd64 too. |
Here's me running your program in the same configurations you did:
I guess @bradfitz and I will poke into this some more tomorrow now that we have a reproduction case. Thanks @shurcooL! |
FWIW, I had already started, and I was curious, so I tried 1.6 and tip in a Linux VM (Ubuntu 14.04), and I was able to reproduce it there too. Both with 1.6 and current tip (golang/go@45d334e). See screenshot. |
Sorry for being slightly OT but why doesn't |
@Slimmy: because closing a connection would then potentially block for a long time - or forever - if the other end of a connection has a lot of / infinite bytes still to be sent. Imagine calling Close only to block on receiving the rest of a 10 GB file you didn't want. As mentioned in the PR:
|
@kofalt: That makes sense, but it doesn't seem like something that would actually happen a lot in practice. Maybe it should be configurable to make |
Because of bad actors in internet that would send infinite number of bytes to your site. |
@devlo: I get that, but everyone using |
You can use |
I'm just trying to understand it all in the context of this PR. To achieve maximum performance you have to make sure you read all of the response body (so that the TCP connection can be reused). But in doing so you are risking large (possibly infinite) streams of data. So the solution is to only utilize this "trick" when you are fetching data from trusted sources (i.e GitHub as in the case of this PR)? |
I'm trying to reproduce this issue in its purest form (ie: Without any dependencies beyond the standard packages), but so far am unable to. I want to fully appreciate if this is a problem for anyone using Here is my attempt (results in the readme): https://github.com/kevin-cantwell/tcpdraintest It appears the performance differences are greatest when the number of requests is low. I'm reusing the default http client and my json responses are empty objects with a few extra newline characters to drain. Am I doing it wrong? |
Ok, I think I understand a bit more about this. Seems like this change makes the biggest difference when there is a large number of bytes remaining in the body after json parsing. A few extra bytes doesn't seem to matter much, even at high request volumes. But if you've got hundreds or thousands of extra bytes after json decoding is done, then yes, this could seriously affect performance. My question then is: without committing to draining the entire remaining bytes (which as @kofalt pointed out, could be GB of data), how far should one read before just closing the response? Because unless you read quite far, the performance gains might not be visible. Another question I might raise is: Is it even reasonable to expect there to be lots of bytes after a terminating json character? There could easily be a newline or two, but hundreds or thousands of them? I'm skeptical that any api would serve that. Again, if I'm missing the point somehow, please illuminate me! :) |
In the GitHub cases above, there are actually no remaining bytes after json parsing, so I don't think your statement here is correct. One current working guess is that the |
@Slimmy, have you seen this inline #317 (comment)? A proposed suggestion there is to "read, say, 512 bytes with a 50ms timeout happening on close, which would probably be good enough for cases like this one."
@kevin-cantwell, an extremely important aspect of this is the fact that these are TLS connections being made, since the protocol used is https, not http. TLS connection handshake is quite expensive, so reusing TLS connections made to a remote server will offer great savings. Reusing non-TLS connections to localhost will have very little gains, and what's what your test appears to be trying to do from what I can tell. |
Flip around the composition order of the http.Response.Body's gzip.Reader vs. the reader which keeps track of waiting to see the end of the HTTP/1 response framing (whether that's a Content-Length or HTTP/1.1 chunking). Previously: user -> http.Response.Body -> bodyEOFSignal -> gzipReader -> gzip.Reader -> bufio.Reader [ -> http/1.1 de-chunking reader ] optional -> http1 framing *body But because bodyEOFSignal was waiting to see an EOF from the underlying gzip.Reader before reusing the connection, and gzip.Reader (or more specifically: the flate.Reader) wasn't returning an early io.EOF with the final chunk, the bodyEOfSignal was never releasing the connection, because the EOF from the http1 framing was read by a party who didn't care about it yet: the helper bufio.Reader created to do byte-at-a-time reading in the flate.Reader. Flip the read composition around to: user -> http.Response.Body -> gzipReader -> gzip.Reader -> bufio.Reader -> bodyEOFSignal [ -> http/1.1 de-chunking reader ] optional -> http1 framing *body Now when gzip.Reader does its byte-at-a-time reading via the bufio.Reader, the bufio.Reader will do its big reads against the bodyEOFSignal reader instead, which will then see the underlying http1 framing EOF, and be able to reuse the connection. Updates google/go-github#317 Updates #14867 And related abandoned fix to flate.Reader: https://golang.org/cl/21290 Change-Id: I3729dfdffe832ad943b84f4734b0f59b0e834749 Reviewed-on: https://go-review.googlesource.com/21291 Reviewed-by: David Symonds <dsymonds@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
It's still unclear to me if we should drain |
However, the docs on http.Client.Do only mention Close, not Read:
|
Just to clear things up for others that ended up here - @pschultz the linked go doc have been updated The response body SHOULD be drained/read to completion! The new docs https://pkg.go.dev/net/http#Client.Do make it more clear.
|
@karelbilek It seems that |
On successful queries (on errors ReadAll runs in CheckResponse), the
Response.Body is not read all the way to the EOF as json.Decoder is used,
which stops at the end of the object, which is probably followed by a \n.
Then, Response.Body.Close() is called. When that happens with a non-drained
Body, the TCP/TLS connection is closed (which makes sense, in case there was
still a lot to read from the network), stopping the Transport from reusing
it. Also, a library user can't drain the Body themselves because Close() is
called before passing the Response to the user.
Since we know GitHub API responses only carry a single JSON object, draining
before closing is free, and saving all those handshakes causes huge savings
in latency, bandwidth and CPU for both the client and the poor GitHub
servers.
Here's the result of running the benchmark below on a ADSL connection: