-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscriber future hangs with large number of messages #242
Comments
The fact that cancel() is hanging probably indicates its a threading issue - perhaps your work item is holding up a thread indefinitely and cancel() can't finish waiting for it? The subscriber runs callbacks on a set of threads, and perhaps they are all being used up? You can pass in a scheduler here. The default thread pool uses a max_workers/threads of just 10. Try raising this limit and see if it helps. If the number of long running tasks is greater than the number of threads, you might still run into starvation issues. Perhaps add a metric to see how long each work item is taking and also to check if some never finish? That might be the root cause. |
@Spawn As @pradn mentioned, the hangs observed are most likely caused by some of the currently executing message callbacks blocking indefinitely. BTW, have you identified any potentially problematic spots in the code in the meantime? |
Hey @plamut thank you for update.
Yes, I did some investigations regarding this. By the fact only place which could block execution is database |
Noted, thanks. |
@Spawn are you still experiencing this issue with the latest version of google-cloud-pubsub? |
Closing as no response, please re-open if you are still experiencing |
Environment details
google-cloud-pubsub
1.7.0Problem
We have an high-loaded worker in kubernetes which listens pub/sub queue. Problem is that when pub/sub queue receive several thousands messages worker is stopping processing without any errors (just freezes). Other workers whose processing time is short are working normally but problem worker sometimes could process one message up to several minutes (http requests, database requests) and after processing several hundreds messages logs are stop printing anything.
So to reproduce issue you should have pub/sub subscription with several thousands messages and create python subscriber which will process one message up to 1 minute and longer sometimes.
Regarding longer processing time I specified
timeout=60
for future so if processing time taking so long it should throwTimeoutError
where I'm callingfuture.cancel()
and here is the problem -cancel()
is waiting for graceful shutdown of consuming messages thread and when I did some debugging for it I found out that sometimescancel()
never ends. For instance on server this kind of freeze took 10 days without any logs/errors/etc.I'm wondering if there way to force future canceling?
Subscription config
Code example
Thank you for your time!
The text was updated successfully, but these errors were encountered: