-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PubSub Subscriber client memory leak #273
Comments
@capt2101akash Thanks for the report. As a start, could you please share the library versions that your application uses, i.e. the output of The subscriber client has an internal waiting queue of all received messages and only dispatches a subset of them to the user-provided callback (i.e. the flow control), while automatically extending acknowledge deadlines for these messages and such. If these Python instances of these messages are kept alive for some reason, that could explain the ever-reporting memory usage growth as the new messages are streamed from the server - but this is just an educated guess for now. Any additional info would be appreciated, looking forward to it! |
Hey plamut, I was in process of doing that :). Thanks for starting this thread. I have updated my comment with sample code and the charts for memory consumption and CPU usage. I will add the pip freeze output here. In addition to that, I have tried changing the flow controls too to see if that was the case but it was a dead end. Anyways the current setting is 30. The number of messages in consideration will be 200-300K in a day, but it is real time messages so it comes all throughout the day.
Bear with the large list of packages. Hopefully, these infos will help you a bit in getting to know my setup. If you need anything else from my side do let know. |
Hey @plamut I hope you got some time to see the details that I have provided. Any inputs will be really helpful. Looking forward to it. |
@capt2101akash I did do a quick check of the sample code and it seems like a straightforward streaming pull usage. The messages being accumulated in the global I hope I can take a closer look some time next week, but until then and if the time allows you, you can perhaps experiment with the following:
|
@plamut thanks for your suggestions. However, I have tried all of the above that you mentioned but to no avail. I will still experiment a bit more on the multiprocessing part though and will let you know my findings. In the meantime I will be looking forward to get more insights from you sometime next week. |
Multiprocessing trick should almost certainly work, but of course if there's a leak, we should find and fix it. Could be tricky, though. :) Edit: Oh, you tried that already? That's somewhat surprising. Could there be anything else outside of the provided sample that is consuming memory, or is the sample fully self-contained? |
The sample is self contained and even if we just do plain subscriber client the memory is increasing. Multiprocessing way is somewhat tricky though as we need to maintain a global list. I am looking into a workaround to do that. In any case, we should really see what's causing this gradual increase so that we can fix that. Can you once try reproducing the same and see if it behaves the same way for you too or not? |
@capt2101akash Thanks for the research, although the linked issue seems a bit different - the user-provided callbacks are, by default, invoked using an internal thread pool executor, while the linked issue is about using a process pool executor for dispatching the messages (which doesn't seem to work due to the underlying What I meant was running the entire streaming pull and message processing in a child process (including the client instantiation), so that any memory used (or leaked) would be automatically reclaimed by the OS upon the child process termination. Something like the following (note: untested): from multiprocessing import Process
...
def pull_messages():
flow_control = pubsub_v1.types.FlowControl(max_messages=30)
subscriber = pubsub_v1.SubscriberClient()
future = subscriber.subscribe(subscription_name, message_ack, flow_control=flow_control)
try:
future.result(timeout=300)
except TimeoutError:
future.cancel()
if __name__ == '__main__':
while True:
p = Process(target=pull_messages)
p.start()
print(f"Started a new child process {p.pid}")
p.join() It would require some adjustments to get any message processing results back to the main process, if needed, but I hope you get the idea. BTW, a random observation - I see that |
@plamut Yeah I got what you are saying. The same piece of code that you sent, I tried that out yesterday. The only concern here is as you said get the message processing results back to the main process. And this is where I felt the referenced issue was coming into place. Sorry if I didn't clearly specify it earlier, but here is what I meant.
NOTE- As you can see I have changed the flow control to get only 5 messages and this is also not resulting in less memory consumption. The memory gradually grows. I hope this clears out the misunderstanding. If you have any inputs on how we can mitigate the error related to can't pickle thread.lock objects it will be helpful. I have added the line where we get that error. |
@capt2101akash Thanks for the clarification, it indeed seems we're on the same page. From the top of my head - the messages dispatched to the call are Message instances that wrap the actual messages received from the server. These instances contain, among other things, a reference to a queue used for asynchronously dispatching requests such as ACK, NACK, and such, but the queue itself cannot be pickled: >>> import queue
>>> import pickle
>>> pickle.dumps(queue.Queue())
TypeError Traceback (most recent call last)
<ipython-input-43-96dc0f23ad59> in <module>
----> 1 pickle.dumps(queue.Queue())
TypeError: cannot pickle '_thread.lock' object Since all the def message_ack(message):
message_list.append(message.data)
message.ack() Would that be sufficient for the post-processing the received message data accumulated in the list? |
@plamut I missed that part, using message.data does indeed work with multiprocessing(I missed that part). However, when I am trying to do my processing inside the callback(other than just message.ack()) it fails with segmentation fault. In addition to that, it behaves very sporadically and fails with segmentation fault quite frequently even though if we just do message.ack() in callback function. I wonder if it is related to the issue I mentioned, but I am again feeling at the dead end as nothing seems to working out, if at all we find out the specifics into why the subscriber thread is not releasing the memory. If you have encountered the segmentation faults in multiprocessing, do let me know what you did for mitigating it. Here is what grpc guys have said about using multiprocessing with grpc(backend of subscriber client). Hence, I am wondering how can we utilize this step using the API only? To address to your point that would that be sufficient for post-processing? Yes. But in the post processing if something goes bad like the issue mentioned above, the messages get lost and we won't be able to get them back. So ideally, I we should do all the processing first and then acknowledge the messages. If that makes sense. Looking forward for your reply. |
That makes perfect sense. The backend will not try to re-send a message after it has received an ACK (within the message's acknowledge deadline). Applications should thus send an ACK only after they have processed a message. If processing means more than merely appending a message to a list (and then perhaps storing it somewhere), In principle, one could append a message and its ACK ID to the list and then manually send an acknowledge requests for the processed messages, but that kind of beats the purpose of the client that relieves the programmer from tasks such as retries, automatic lease management, etc., thus I wouldn't go down that route. I can't comment on the segfaults in the callback without the code, but if they frequently happen even when just calling Random though - is there a specific reason for running the streaming pull for a fixed amount of time, cancelling it, and then starting it again? Was that just for the purpose of profiling? Would your use case allow for moving this loop out of the subscriber script itself? By that I mean running it for a limited amount of time, processing the message stream, and then letting it terminate. After that (or even a parallel) another subscriber is fired up to continue the processing. Kind of the same idea as before, but without a child process. (I do realize this is not ideal, but mostly trying to come up with a reasonable workaround for the time being) |
@plamut per your point regarding running the streaming pull for a fixed amount of time and cancelling it has two uses:-
To your suggestion, are you suggesting that we should have a cronjob kinda thing (I am also thinking that). If that is the case, then yes it will be our last resort (not ideal for real time cases). |
If nothing else works, it might indeed be the most reasonable workaround for the time being, yes. And the restart period can be "long" (hours or more), depending on how slow/fast the memory leaks with the real load. Furthermore, there doesn't have to be any downtime if another independent subscriber is launched, say, a minute or so before the currently running one is about to terminate. I'll post more if I get a successful segfault reproduction, although that might not even be necessary, considering that multiprocessing appears to be a rabbit hole... |
After testing a good number of combinations - Python version, I used the initial code sample as a base, ran the streaming pull in the Creating a subscriber inside the However, if running I was also not able to reproduce any segfaults despite doing non-trivial work in the message callback (queried a simple database), although absence of evidence is not an evidence of absence, of course. But if I understand correctly, that should be easily reproducible, @capt2101akash ? It might be worth noting that I created, ran, and closed a subscriber instance inside the UpdateManaged to reproduce a segfault! It happens if subscriber instance is created before running @capt2101akash As a sanity check, can you confirm these findings? I noticed that your earlier multiprocessing example created the subscriber in the main process, but did you also try with instantiating it in a subprocess? It would be worth giving the multiprocessing workaround another shot, if it works for you, too. |
@plamut I can try creating subscriber in the sub process and revert back to you my findings. However, I think we can 100% agree to the point that there is a memory leak that has to be addressed, as running streaming pull in subprocess is just a workaround that we are using to overcome this issue. I would be really glad if we can pin point the module/functionality which is causing this leak and fix this. |
@capt2101akash The leak should definitely be fixed, indeed, but it's always good to have a reasonable workaround at hand - fingers crossed that it will work for you! 🤞 |
Just to give you a good news. It did work 😊. I somehow never tried this approach 😅 my bad. |
I would like to give an update on the most recent findings. I tried running the streaming pull in a single process and for longer periods of time (15+ minutes as opposed to only a few minutes). I tried both with a plain callback logic, i.e. only calling In all runs I made sure that the subscription always had enough messages at hand to keep the subscriber as busy as possible (so that flow control had to kick in). 15 x 60 seconds, no appendInitially, the memory consumption slowly increases, but eventually flattens out. This hints that any internal streaming pull operations probably do not leak memory. 15 x 60 seconds, appending to a listSince received messages are temporarily stored in a temporary global list, the memory consumption increases faster. However, the growth is not linear - there are points when some of the memory is released back to the OS. It appears that after the initial increase, the memory consumption actually remains bounded. If we only looked at the first few minutes, it might indeed seem that there is a memory leak, but monitoring the process over a longer time span does not support that claim. 30 x 60 seconds, appending to a listSince the second 15-minute graph was a bit inconclusive and somewhat differed from the other two, I did yet another run for full 30 minutes. The graph confirmed that the memory consumption is, in fact, bounded. The most likely reason for seemingly "leaking" memory at the beginning is CPython's own memory management. When an object such as a list of messages goes out of scope, the memory is not immediately released back to the OS. Instead, the interpreter keeps it for future use, as it assumes that more objects will be created, and re-using the memory from an already allocated memory pool is faster than constantly freeing the memory to the OS and then allocating it back. @capt2101akash I tried to find any zombie Message (or other) instances lurking in the dark and forgotten corners of memory, but couldn't really find any, they seem to be garbage collected just fine. I also checked the streaming pull code for any queues, lists, etc. that could grow indefinitely. Initially, one possible candidate was the histogram that tracks how long does it take to acknowledge messages on average, but the the histogram size is bounded. The add() method maps all ACK times onto the interval |
@plamut I have also observed this when the frequency of messages coming in the topic is not real time or to be precise not that frequent. However, if you have seen my previous comments wherein I have posted the snippet, where I also tried just acknowledging the messages without any append and with a continuous stream of messages flowing in at the rate of 100 messages/sec, the graph shows gradual increase. However, as you have mentioned it does release some memory in between but overall it increase in a period of a day or two. Just a quick update to where I currently stand now for my application so that others if they are facing any issues can correlate to this - I ran my subscription client in a multiprocessing environment, which after completion releases it's memory back to OS and I get what I required. Just a thing to keep in mind though is that one might have to adjust their processing logic in order to utilise this solution effectively. |
@capt2101akash Thanks for the reply, and I'm glad that the multiprocessing workaround is working well.
FWIW, in the test that I conducted the subscriber running on my laptop received and processed (i.e. appended to a list) somewhere around 6500 messages / minute, which is comparable to the load in your test (the message stream was constantly high). I did not run it for the entire day, though.
Yeah, that's what it looked like, although that test only ran for 5 minutes or so. Running the same test longer later revealed that the memory consumption actually stabilizes eventually. Based on your reports, it would probably still make sense to set up a small cloud app with a busy publisher and a simple ACK-only subscriber, and let it run for at least 24 hours. I'll try to do that when I get back to this. |
Hi @plamut, it might help generate a memory profile using memory_profiler. It describes all living objects in Python space. If we do observe a large number of some class, it indicate the leak happen in Python space. Otherwise, it might caused by C/C++ objects leaking. Here is one memory leak fix in gRPC: grpc/grpc#22123 (comment). |
FWIW, the reason why this was marked as blocked is that the feedback time can be long. Even running and profiling a sample app locally for half an hour might not always be enough to definitely conclude whether a leak is present or is it just a short-term noise (some of the memory usage graphs produced by the profiler turned out to be misleading). (the resource consumption graphs from GCP are more telling, it might make sense to profile there and then inspect the object graph inside the container) |
Just want to ping this again. I'm struggling to hunt down what might be the cause of this issue and we're literally having to restart our Pub/Sub microservices multiple times a week as the memory consumption just climbs and climbs. |
@synackSA Sorry for not responding earlier, I was away. As you have figured out by yourself in the meantime, this one is tricky. When testing locally, it was hard to distinguish whether the memory consumption graphs actually displayed a leak (or a fix for it), or just a short-term noise. Here, "short-term" could be 30 minutes or even more, meaning that the feedback is very slow, and the graphs from multiple runs were often inconsistent. Running the streaming pull from a busy subscription for several hours did show a probable leak, but again the memory increase was relatively small, although noticeable. I also tried hunting it down with tracemalloc, but nothing much popped out. It is thus believed that the leak does not occur at the Python level, but instead somewhere in C - the Anyhow, can I just ask if you have already tried the workaround that reportedly works? The trick is to run the streaming pool in a subprocess for a fixed amount of time. 30 minutes could be a good choice, as the backend already automatically terminates the stream every half an hour, in which case the client automatically re-connects. By running the stream in a subprocess, any memory leaked will be automatically returned to the OS when the subprocess terminates. |
Just want to follow up on this and see if there's been any movement on this? We've implemented the design where we spawn another process, which dies after a certain amount of time, get's cleaned up and then spawn a new one. However, I'd like to keep tabs on this issue to see what's happening. |
Thanks very much @gberth |
This problem is hitting me as well. I have a long-running (10 secs) callback process potentially using some amount of memory, but that's not the problem, the service is capable of processing several messages (I configured consumer to go 1-by-1 coz I don't need performance and want to maximize stability) but still I see the memory leak pattern ending in service crash. Tried the multiprocessing approach and it didn't worked for me. Im in a containerized context and I had been unable to force SO to free that memory. any other suggestion or some deadline for a fix?? BTW |
BTW I dont get it ... this is a memory leak on an official Google library that has been alive for more than 1 year (?!) |
Without any evidence, it is my opinion/experience that it also comes to the use of Bigtable from python. So I suspect it has something to do with more low-level code than pubsub itself. I agree that the absence of priority and development is frustrating. It consumes a lot of time that could/should have been used on business development |
Sadly, this is still in backlog. |
That is strange. We are processing approximately 1500-2000 messages / sec from three subscriptions. This is a cached state app, and the temporary solution was to do a regularly controlled shutdown of the app 3 times during 24 hours. But, we are having a good amount of memory :-) |
Well that's not strange. What we noticed is that the whole memory taken by the process is not released. In our case, we had 1024Mb of ram. Every request, took about 200Mb of ram. Even if sent sequentially (waiting 10 seconds between the messages), after 5-6 messages the container was crashing for OOM error. Maybe your operation is not memory intensive. Sadly, restarting every 5 or 6 messages was not acceptable for us. |
Anything at all going on with this? |
A bit frustrating that this never gets priority. And why is it blocked? One observation, it seems that the memory usage increases faster proportional with the (time or occurrence of) reaching of max-messages. rgds |
Could this be related to grpc/grpc#28747 ? |
Following up with grpc folks to see when this fix will be in a grpc release (currently the change is checked in, but not included in the latest release). As soon as it is in a release, I will update this issue with the fixed grpc version. |
According to grpc folks, this change will be included in grpc release: v1.51.0, which will be out in approx 1-2 weeks. |
The pre-release v1.15.0-pre1 contains the fix: |
The release fix is out: https://github.com/grpc/grpc/releases/tag/v1.51.0 |
Environment details
General, Core, and Other are also allowed as types
python --version
Python 3.7Steps to reproduce
Hi guys, I have created as part of a project a simple subscriber client which get iot data on a real time basis. My subscriber client runs indefinitely and in the callback function I do some transformation in the messages that comes. I have deployed this client as a microservice in GKE, however to utilise GKE more efficiently I was looking into utilising the hpa. I observed that the subscriber client's memory always increases, it never goes down even if the queue is empty. I have tried cancelling thread and closing the subscriber client by putting it in a while loop to release socket resources but in the next iteration, I see the memory has not been freed and it stacks up. I have used memory profiler to detect the memory leak. I know this issue has been discussed in past(#5001) but I am utilizing the latest libraries but still facing this. Any help would be really appreciated.
Providing a sub-sample of code where majority of work happens. I am putting all the messages in a global list so that I can send them for further transformation downstream(If I add the transformation of messages in the callback function the memory consumption increases more rapidly). Adding the GKE dashboard screenshot too for memory consumption and CPU usage. My expectation and the general workflow should ideally be whenever there are spikes in CPU i.e whenever messages are received from queue the memory consumption should increase and after that it should release the memory.
Thanks!
The text was updated successfully, but these errors were encountered: