-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eventhub stops getting data when under load #273
Comments
I just change that line to:
and it works now, should I make a PR or can we possibly discuss it if you have some other idea? |
The issue with this is that it'll force recovery every 'n' seconds (in your case 30) if there's no activity. So really we need to fix the core bug here, which appears to be that the Receiver is no longer "live" and so it's not responding to messages. There's a few reasons this could happen. Can you give me a better idea of how you reproduce this? How long is "after some time"? Are we talking 3 days, 4 days, that kind of thing? Also, do you see this after longer idle periods or is this even when activity is active? |
I totally agree to fix the core issue here, although I'm afraid about the reselience here. I can reproduce this by running a read from the start of an eventhub with prefetch 2000. After a while (few minutes max) the reading stops, and we have logic that if i get no data for 30 secs it checks if I am at the end of partition (over same connection). And when i'm not we let the microservice crash and get restarted, from a checkpoint. This happens only when there is a lot of activity. Also this can be a server issue |
When we run eventhub go listener after some time we get stuck on https://github.com/Azure/azure-event-hubs-go/blob/master/receiver.go#L291 . It seems that the session get's broken, however I know that the connection is OK because I watched it in wireshark and when I call GetPartitionInfo (it uses the same connection so the socket is not dead) I see that I am not at the end of the partition.
I would like to ask if there shouldn't be some kind of timeout if the session get's broken somehow as I saw such code on the C# variant of this library. However I dont see anything in this code like that. Maybe a call to Recover if there is no data after some time?
The text was updated successfully, but these errors were encountered: