-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not all partitions are assigned to consumer group if partitions % instances != 0 #13546
Comments
Thanks for reaching out josago; and for the level of detail you provided as well. While we investigate this, I'm tempted to suggest trying the 5.2.0b1 release (--pre --upgrade when pip installing) as it contained some changes to the load balancing algorithm that may intersect with this beneficially. |
Hello - is there any updates on this bug expected? it seems to be a major issue. |
Hey @saadansarithefirst , Thanks for your patience/any info that you have. |
…te issue Azure#13546 having been fixed. (it seems to have been.)
Following up on this (Thanks all for your patience), the good news is it appears our assumptions were on-point and the addition of the novel load balancing might have addressed this. See the tests here I would mention that the tests assume use of the Greedy checkpoint acquisition strategy to most effectively deal with the problem you're describing (quickly ensuring all partitions are claimed.) Am closing this for now under the hope and assumption that this approach addresses the aforementioned repro as tested, but if I've misunderstood or this issue still seems to recur, do not hesitate to loop back on this and give us a shout. |
…te issue Azure#13546 having been fixed. (it seems to have been.) (Azure#15786) Co-authored-by: Yunhao Ling <adam_ling@outlook.com>
azure-eventhub==5.1.0
azure-eventhub-checkpointstoreblob==1.1.0
Describe the bug
Having 6 instances of a process consuming records from an EventHub topic with 32 partitions (using the same consumer group) results in every instance being assigned to 5 partitions, with 32 - 6 * 5 = 2 partitions staying unassigned to any consumer, even hours after the processes have been launched. No partitions remain unassigned if either 4 or 8 instances are used instead (we hypothesize the reason is that 32 % 8 == 0, 32 % 4 == 0, but 32 % 6 != 0).
To Reproduce
Steps to reproduce the behavior:
The source code of the program follows:
EventHubMonitor.txt
Expected behavior
The expected behavior would be that 4 instances are assigned to 5 partitions, and 2 instances are assigned to 6 partitions, for a total of 4 * 5 + 2 * 6 = 32 assigned partitions.
Screenshots
I include screenshots that show the output of each one of the 6 instances after 5 minutes have passed since they were launched.
Additional context
While we are using the Azure Blob check-pointing mechanism to balance the consumers, we do not use it to commit reading offsets. In practice, to run several instances of this process in parallel we use a ReplicationController within a Kubernetes cluster.
The text was updated successfully, but these errors were encountered: