-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: Client.Timeout is not propagated to Request's Context Deadline #31657
Comments
Is Artillery running from localhost and hitting your localhost? Maybe Network Link Conditioner doesn't degrade that traffic, so server goroutines pile up but your clients hitting google.com are getting 100% packet loss. You could hit Control-\ (SIGQUIT) and see a goroutine dump and see what's piling up and confirm the theory above. |
Yes, Artillery is running from my machine and hitting my localhost. I was also able to reproduce the issue from Postman as well as just using curl to hit localhost over and over.
|
Garbage collection does not affect the number of goroutines. I don't see all that many goroutines in the ^\ dump. Those goroutines seem to be mostly waiting on network I/O one way or another. When you say that the number of goroutines goes up, how many are we talking about? |
When running locally with 1TPS I see the goroutines go as high as 140 before dropping off. |
Which goroutines? You can also get it with https://golang.org/pkg/runtime/#Stack and all=true. |
Note that while 140 may be a lot of goroutines for your program, it's not a lot for Go in general. Go programs can easily support thousands of goroutines. |
@bradfitz I saw some discussions in other issues of a potential way to get debug output for http1 the way |
I haven't dug into the runtime stack quite yet (I'll get to that in just a bit), but I wanted to share it here anyways: goroutine_stack.txt I know 140 isn't a ton of goroutines, but I'm more curious about the behavior of the goroutines climbing and climbing and then suddenly dropping off back down to 3 or 5. Then a second later, starting to climb again. I am also curious if there is more of a context cancelation that can be done so the goroutine climb doesn't happen at all? On a production server I have seen this happen for a longer period of time, where the goroutines climb and queue so much that CPU begins to drop, requests are no longer served, and the requests that are queued are obviously taking a long time to finish. There is no smoking gun as to what causes it, which is why we've begun looking into network issues. As it is probably expected behavior, I would love to know if there is a way to mitigate it's impact or if there are any best practices for handling network glitches besides using server side and HTTP client side timeouts as well as specific transport settings. |
Thanks for the dump. That showed the problem: the DNS lookups (via conn dials) are blocked for a long time, which means the Client.Timeout value isn't being propagated down to the dials. A little test added to func TestClientPropagatesTimeoutToContext(t *testing.T) {
errDial := errors.New("not actually dialing")
c := &Client{
Timeout: 5 * time.Second,
Transport: &Transport{
DialContext: func(ctx context.Context, netw, addr string) (net.Conn, error) {
deadline, ok := ctx.Deadline()
if !ok {
t.Error("no deadline")
} else {
t.Errorf("deadling in %v", deadline.Sub(time.Now()).Round(time.Second/10))
}
return nil, errDial
},
},
}
c.Get("https://example.tld/")
} Currently fails with:
Fix seems simple enough. Thanks for the report. |
Thank you, Brad. Curious, do you have any recommendations on mitigation for the time being until the 1.13 release? |
One workaround for now is to use Request.WithContext to set a Request's Context to include a timeout (with context.WithTimeout) and then you don't need to use the old pre-context Client.Timeout. |
Change https://golang.org/cl/175857 mentions this issue: |
I sent a CL for this but I don't think it's important enough given the (non-zero) risk for Go 1.13, so I'm going to mark this for Go 1.14. Go 1.14 will also have http.NewRequestWithContext to make using contexts a bit easier. |
Go 1.13 and below has a bug that client timeout will not be propagated to http request context. That bug was fixed already and will be present in go1.14 (golang/go#31657). With current go master, our TestSetupTeardownThresholds fails. The reason is that we only push sample if the request context not done. When the bug in Go stdlib was not fixed, our own http Client.Timeout is not passed to request context, so when client makes request, it does not aware about the request context, and always push the sample. Now when Client.Timeout is passed to request context, as long as the client finish the request, the request context is consider done, and we do not push the sample anymore. To fix it, simply wrapping the request context with the client timeout. Fixed #1260
Go 1.13 and below has a bug that client timeout will not be propagated to http request context. That bug was fixed already and will be present in go1.14 (golang/go#31657). With current go master, our TestSetupTeardownThresholds fails. The reason is that we only push sample if the request context not done. When the bug in Go stdlib was not fixed, our own http Client.Timeout is not passed to request context, so when client makes request, it does not aware about the request context, and always push the sample. Now when Client.Timeout is passed to request context, as long as the client finish the request, the request context is consider done, and we do not push the sample anymore. To fix it, simply wrapping the request context with the client timeout. Fixed #1260
Go 1.13 and below has a bug that client timeout will not be propagated to http request context. That bug was fixed already and will be present in go1.14 (golang/go#31657). Currently, we set Timeout in our Client, but do not use it else where. It will not affect the request context for now, but does when go1.14 release. To make it compatible and works correctly with other go versions, we set the request context explicitly with the timeout. Update #1260
Go 1.13 and below has a bug that client timeout will not be propagated to http request context. That bug was fixed already and will be present in go1.14 (golang/go#31657). Currently, we set Timeout in our Client, but do not use it else where. It will not affect the request context for now, but does when go1.14 release. To make it compatible and works correctly with other go versions, we set the request context explicitly with the timeout. Update #1260
Go 1.13 and below has a bug that client timeout will not be propagated to http request context. That bug was fixed already and will be present in go1.14 (golang/go#31657). Currently, we set Timeout in our Client, but do not use it else where. It will not affect the request context for now, but does when go1.14 release. To make it compatible and works correctly with other go versions, we set the request context explicitly with the timeout. Update #1260
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Network Link Conditioner
to cause a network glitch to occur (100% Loss: 100% Packets Dropped)What did you expect to see?
Goroutines would stay steady or drop off as requests aren't able to exceed and timeouts are being met.
What did you see instead?
Goroutines take off while the request returns the expected error. The goroutines increase and increase until they eventually drop off, then they increase and drop off repeats.
The text was updated successfully, but these errors were encountered: