-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Confluent.Kafka 1.6.0 #193
Comments
Hi @amotl |
Hi @amotl
|
Dear Tsuyoshi, thanks for asking. I want to apologize that I can't really provide any more detailed insights. We have neither a repro nor any logs available and just upgraded to the most recent version of librdkafka in a trial-and-error manner. What I can tell you is about the symptoms we experienced and more details about our environment. We are using confluent-kafka-python to connect to Azure Event Hubs from a) a self-provisioned Kubernetes cluster on Azure and b) (only recently) from a managed AKS cluster. We are using the maximum of 32 partitions available on Azure Event Hubs and consume from them by sharding the load to 32 consumer instances, each one running on a different pod. Occasionally, we saw specific partitions to be stalled from consumption. It looks like it has always been the same, like 5 and 21 - so at least there have been some deterministic aspects around this issue ;]. We determined the issue by a) seeing the specific consumer instances "hanging" (i.e. no more log output) and b) by monitoring the partition offset time metric for each partition. With kind regards, P.S.: You will recognize that our environment didn't either use Confluent.Kafka nor was connecting to a vanilla Kafka broker. Instead, we used the Python client (also based on librdkafka) and have been connecting to Azure Event Hubs. However, we are about to use azure-functions-kafka-extension within a rewrite of our infrastructure and thought it would be a good idea to give you a heads up on this in order to prepare everything accordingly that we won't trip into the same issues. Hence, I came here to share our observations with you and suggest to a) bump to the most recent versions of librdkafka and b) unlock the configuration options P.P.S.: monitoring the partition offset time metric for each partition - that is: Putting |
Thank you for sharing! It helps! |
Done the upgrading. https://github.com/Azure/azure-functions-kafka-extension/releases/tag/3.3.1 |
Dear @TsuyoshiUshio, @fbeltrao and @ryancrawcour,
thanks again for upgrading to Confluent.Kafka 1.5.2 (#185). Nevertheless, we still saw some partitions occasionally stalling on the consumer side, even after applying all of the mitigations outlined at confluentinc/librdkafka#3109.
However, after just moving on to v1.6.0-PRE3, things appear to be really smooth now, see also confluentinc/librdkafka#3109 (comment).
I just wanted to give you a heads up on this and will notify you as soon as 1.6.0 GA is out.
With kind regards,
Andreas.
The text was updated successfully, but these errors were encountered: