-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the purpose of the 10 minute wait? #92
Comments
Getting same problem. |
Yeah I forked it and removed the wait :) which helped a lot... here's my fork https://github.com/ryshoooo/spdystream and I just added this line into go.mod |
I opened a partial fix for this in #99 If the error is not handled / consumed from the shutdownChan, there's still a 10 minute timeout, but if the error is consumed (as expected), there is no background goroutine spawned / leaking |
I think this can be closed now |
#99 only avoided the 10 minute wait when the close error is handled / consumed... I think there's more to do to improve the case where it is not explicitly handled / consumed ... 10 minutes is a long time to leave the shutdown call hung, I think |
yes, I've thought that, but on k8s, we are trying to change the protocol based on http2, that's the reason that I've thought it is not important. |
opened a follow-up in #101 to shorten the timeout for the unhandled error case |
In the microservice I'm building, I've experienced a memory leak using the Kubernetes client-go package, which I traced back to https://github.com/moby/spdystream/blob/master/connection.go#L733.
The memory leak is really well documented in kubernetes/kubernetes#105830.
Essentially what happens is that running the stream executors with in-cluster kubeconfig context leads to many broken pipes, which leads to the waiting time of 10 minutes before the shutdown goroutine ends. This waiting time keeps the data in memory alive. Not a big deal if it's a small number of broken pipes, but in my case the number is rising very quickly and easily can reach 2-3 GB of allocated memory within the 10-minute hold, thus considering it a memory leak of high significance (essentially a system-service pod is taking memory that the users of the cluster can't use).
The fix for this is that I replaced the waiting time from 10 minutes to 10 milliseconds instead, and the memory leak is "gone"(not truly gone, just very low and GCed quickly). However, I wonder what are the repercussions of changing https://github.com/moby/spdystream/blob/master/connection.go#L733 to milliseconds? Also, what is the point of waiting here in the first place?
The text was updated successfully, but these errors were encountered: