What's the purpose of the 10 minute wait? #92

ryshoooo · 2023-07-06T07:38:43Z

In the microservice I'm building, I've experienced a memory leak using the Kubernetes client-go package, which I traced back to https://github.com/moby/spdystream/blob/master/connection.go#L733.

The memory leak is really well documented in kubernetes/kubernetes#105830.

Essentially what happens is that running the stream executors with in-cluster kubeconfig context leads to many broken pipes, which leads to the waiting time of 10 minutes before the shutdown goroutine ends. This waiting time keeps the data in memory alive. Not a big deal if it's a small number of broken pipes, but in my case the number is rising very quickly and easily can reach 2-3 GB of allocated memory within the 10-minute hold, thus considering it a memory leak of high significance (essentially a system-service pod is taking memory that the users of the cluster can't use).

The fix for this is that I replaced the waiting time from 10 minutes to 10 milliseconds instead, and the memory leak is "gone"(not truly gone, just very low and GCed quickly). However, I wonder what are the repercussions of changing https://github.com/moby/spdystream/blob/master/connection.go#L733 to milliseconds? Also, what is the point of waiting here in the first place?

134130 · 2024-04-23T10:38:32Z

Getting same problem.
@ryshoooo How did you solved this problem? did you forked this and customized?

ryshoooo · 2024-04-23T20:39:25Z

Yeah I forked it and removed the wait :) which helped a lot... here's my fork https://github.com/ryshoooo/spdystream

and I just added this line into go.mod replace github.com/moby/spdystream => github.com/ryshoooo/spdystream v0.0.0-20230706002604-5d891ea12436

liggitt · 2024-06-27T16:22:25Z

I opened a partial fix for this in #99

If the error is not handled / consumed from the shutdownChan, there's still a 10 minute timeout, but if the error is consumed (as expected), there is no background goroutine spawned / leaking

aojea · 2024-06-28T06:52:01Z

I think this can be closed now

dims · 2024-06-28T10:42:57Z

thanks @aojea @liggitt

liggitt · 2024-06-28T12:08:15Z

#99 only avoided the 10 minute wait when the close error is handled / consumed... I think there's more to do to improve the case where it is not explicitly handled / consumed ... 10 minutes is a long time to leave the shutdown call hung, I think

134130 · 2024-06-28T13:06:59Z

yes, I've thought that, but on k8s, we are trying to change the protocol based on http2, that's the reason that I've thought it is not important.

liggitt · 2024-06-28T16:57:25Z

opened a follow-up in #101 to shorten the timeout for the unhandled error case

aojea mentioned this issue Jun 27, 2024

Ignore reported goroutine leak during SPDY shutdown kubernetes/kubernetes#125739

Merged

liggitt mentioned this issue Jun 27, 2024

Avoid 10 minute goroutine leak in error case for handled errors #99

Merged

dims mentioned this issue Jun 27, 2024

Update moby/spdystream to v0.4.0 kubernetes/kubernetes#125766

Merged

dims closed this as completed Jun 28, 2024

liggitt mentioned this issue Jun 28, 2024

Avoid leaking goroutines on close #101

Merged

yimgzz mentioned this issue Nov 2, 2024

Bump spdystream to v0.5.0 to resolve memory leaks on remotecommand.NewSPDYExecutor tarantool/tarantool-operator#224

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the purpose of the 10 minute wait? #92

What's the purpose of the 10 minute wait? #92

ryshoooo commented Jul 6, 2023

134130 commented Apr 23, 2024

ryshoooo commented Apr 23, 2024

liggitt commented Jun 27, 2024 •

edited

Loading

aojea commented Jun 28, 2024

dims commented Jun 28, 2024

liggitt commented Jun 28, 2024

134130 commented Jun 28, 2024

liggitt commented Jun 28, 2024

What's the purpose of the 10 minute wait? #92

What's the purpose of the 10 minute wait? #92

Comments

ryshoooo commented Jul 6, 2023

134130 commented Apr 23, 2024

ryshoooo commented Apr 23, 2024

liggitt commented Jun 27, 2024 • edited Loading

aojea commented Jun 28, 2024

dims commented Jun 28, 2024

liggitt commented Jun 28, 2024

134130 commented Jun 28, 2024

liggitt commented Jun 28, 2024

liggitt commented Jun 27, 2024 •

edited

Loading