Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the purpose of the 10 minute wait? #92

Closed
ryshoooo opened this issue Jul 6, 2023 · 8 comments · Fixed by #101
Closed

What's the purpose of the 10 minute wait? #92

ryshoooo opened this issue Jul 6, 2023 · 8 comments · Fixed by #101

Comments

@ryshoooo
Copy link

ryshoooo commented Jul 6, 2023

In the microservice I'm building, I've experienced a memory leak using the Kubernetes client-go package, which I traced back to https://github.com/moby/spdystream/blob/master/connection.go#L733.

The memory leak is really well documented in kubernetes/kubernetes#105830.

Essentially what happens is that running the stream executors with in-cluster kubeconfig context leads to many broken pipes, which leads to the waiting time of 10 minutes before the shutdown goroutine ends. This waiting time keeps the data in memory alive. Not a big deal if it's a small number of broken pipes, but in my case the number is rising very quickly and easily can reach 2-3 GB of allocated memory within the 10-minute hold, thus considering it a memory leak of high significance (essentially a system-service pod is taking memory that the users of the cluster can't use).

The fix for this is that I replaced the waiting time from 10 minutes to 10 milliseconds instead, and the memory leak is "gone"(not truly gone, just very low and GCed quickly). However, I wonder what are the repercussions of changing https://github.com/moby/spdystream/blob/master/connection.go#L733 to milliseconds? Also, what is the point of waiting here in the first place?

@134130
Copy link

134130 commented Apr 23, 2024

Getting same problem.
@ryshoooo How did you solved this problem? did you forked this and customized?

@ryshoooo
Copy link
Author

Yeah I forked it and removed the wait :) which helped a lot... here's my fork https://github.com/ryshoooo/spdystream

and I just added this line into go.mod replace github.com/moby/spdystream => github.com/ryshoooo/spdystream v0.0.0-20230706002604-5d891ea12436

@liggitt
Copy link
Contributor

liggitt commented Jun 27, 2024

I opened a partial fix for this in #99

If the error is not handled / consumed from the shutdownChan, there's still a 10 minute timeout, but if the error is consumed (as expected), there is no background goroutine spawned / leaking

@aojea
Copy link

aojea commented Jun 28, 2024

I think this can be closed now

@dims
Copy link
Collaborator

dims commented Jun 28, 2024

thanks @aojea @liggitt

@dims dims closed this as completed Jun 28, 2024
@liggitt
Copy link
Contributor

liggitt commented Jun 28, 2024

#99 only avoided the 10 minute wait when the close error is handled / consumed... I think there's more to do to improve the case where it is not explicitly handled / consumed ... 10 minutes is a long time to leave the shutdown call hung, I think

@134130
Copy link

134130 commented Jun 28, 2024

yes, I've thought that, but on k8s, we are trying to change the protocol based on http2, that's the reason that I've thought it is not important.

@liggitt
Copy link
Contributor

liggitt commented Jun 28, 2024

opened a follow-up in #101 to shorten the timeout for the unhandled error case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants