-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pubsub: identified memory leak in receiver #10094
Comments
Can you clarify if this is an issue you encountered in production, or how you came across this issue? I haven't fully verified the reason why |
Hi! Yup, we noticed because it started showing up in production. The memory leak there is only really measurable over the scale of hours or days, however if I run something pathological like this, it's very apparent:
But even with something more reasonable (i.e. not cancelling the context straight away, but after receiving some messages and/or some sensible timeout), it's very clear that the program accumulates grpc stream goroutines. Can you reproduce the above? What tests were you running? |
I was doing something like this:
I was not actively receiving message on the stream when I cancel this though. From this, I've been running this for about 24 hours but have not noticed any memory leak in pprof (total memory is about 2MB) I'll try with your code to see if I can reproduce this issue. |
Hmm, I tried your test and dug into it a bit more: it's indeed somewhat subtle. The reason why you don't see the leak is that basically there's a race when stopping the 'Receive': the I haven't fully figured out the ins and outs, but in your example, the 15s receive time seems to make it so that the 'Recv' always 'wins' the race, thus the stream shuts down. But in other cases, e.g. shorter receive window, or try for instance cancelling the context from within the message handler, it's the other way around, and bailing out before calling 'Recv' enough may consistently 'win', leaving the grpc streams hanging around... Just recalling the relevant docs for reference:
(so basically in the current code 3 may happen and clear the resources, but it's not guaranteed, and in some cases seems to be guaranteed not to happen. and my 'fix' suggested above is to simply ensure 2 happens...) |
Yeah so after using your change, I was able to see a slow growth of memory usage from 3MB -> 22MB over the course of 2 hours. Yeah given that the change is fairly small and shouldn't have any adverse effects, I would be happy to merge in a PR for this if you want to create one. Otherwise, I can create a PR for this next week and credit you. |
Ah, sure, can do! I'll push that change up as it is, then, but feel free to solve the issue some other way if you prefer. N.B. If you cancel the context before the call to |
Side note, the docs you linked are very relevant and probably what contributed to this issue in the first place. I think one issue is
I confirmed that calling |
Client
PubSub
Environment
Any
Go Environment
go version 1.21
Likely irrelevant
Expected behavior
Long-running processing, making repeated calls to
Subscription.Receive()
does not leak resources.Actual behavior
gRPC
StreamingPull
streams remain open in the backgroud and accumulate over time.Root cause
pullStream.cancel()
never gets called, andCloseSend()
is not enough to actually terminate the underlying stream.Code
A minimal fix would be e.g.
Or this could be done either further up or down the call chain. For instance get
Subscription.Receive
to do it after everything else is done, so that the cancelation error does not appear sooner (which however may lead to unnecessary wait?).Additional context
This could well be the same issue as #5888 and #5265, however those were closed, and I thought it would make more sense to reopen a new one with the full diagnostics.
The text was updated successfully, but these errors were encountered: