Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Confluent.Kafka 1.6.0 #193

Closed
amotl opened this issue Nov 16, 2020 · 5 comments
Closed

Upgrade to Confluent.Kafka 1.6.0 #193

amotl opened this issue Nov 16, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@amotl
Copy link

amotl commented Nov 16, 2020

Dear @TsuyoshiUshio, @fbeltrao and @ryancrawcour,

thanks again for upgrading to Confluent.Kafka 1.5.2 (#185). Nevertheless, we still saw some partitions occasionally stalling on the consumer side, even after applying all of the mitigations outlined at confluentinc/librdkafka#3109.

However, after just moving on to v1.6.0-PRE3, things appear to be really smooth now, see also confluentinc/librdkafka#3109 (comment).

I just wanted to give you a heads up on this and will notify you as soon as 1.6.0 GA is out.

With kind regards,
Andreas.

@TsuyoshiUshio
Copy link
Contributor

Hi @amotl
Thank you, once it is released, I'll update it.

@TsuyoshiUshio TsuyoshiUshio added enhancement New feature or request and removed Needs: triage (functions) labels Nov 17, 2020
@TsuyoshiUshio
Copy link
Contributor

Hi @amotl
Can I ask you a question?

  1. How can we reproduce the issue? EventHubs will cause the issue?
  2. How exactly the log from libkafka we are going to see?
    Some of users look hit this issue, however, I can't have a confidence.

@amotl
Copy link
Author

amotl commented Nov 20, 2020

Dear Tsuyoshi,

thanks for asking. I want to apologize that I can't really provide any more detailed insights. We have neither a repro nor any logs available and just upgraded to the most recent version of librdkafka in a trial-and-error manner. What I can tell you is about the symptoms we experienced and more details about our environment.

We are using confluent-kafka-python to connect to Azure Event Hubs from a) a self-provisioned Kubernetes cluster on Azure and b) (only recently) from a managed AKS cluster.

We are using the maximum of 32 partitions available on Azure Event Hubs and consume from them by sharding the load to 32 consumer instances, each one running on a different pod.

Occasionally, we saw specific partitions to be stalled from consumption. It looks like it has always been the same, like 5 and 21 - so at least there have been some deterministic aspects around this issue ;].

We determined the issue by a) seeing the specific consumer instances "hanging" (i.e. no more log output) and b) by monitoring the partition offset time metric for each partition.

With kind regards,
Andreas.

P.S.: You will recognize that our environment didn't either use Confluent.Kafka nor was connecting to a vanilla Kafka broker. Instead, we used the Python client (also based on librdkafka) and have been connecting to Azure Event Hubs. However, we are about to use azure-functions-kafka-extension within a rewrite of our infrastructure and thought it would be a good idea to give you a heads up on this in order to prepare everything accordingly that we won't trip into the same issues. Hence, I came here to share our observations with you and suggest to a) bump to the most recent versions of librdkafka and b) unlock the configuration options socket.keepalive.enable and metadata.max.age.ms (#187) in order to be able to adjust to the recommended settings by Microsoft when connecting to Event Hubs.

P.P.S.: monitoring the partition offset time metric for each partition - that is: Putting event.enqueued_time and partition_context.last_enqueued_event_properties["offset"] from def on_event(self, partition_context: PartitionContext, event: EventData) coming from azure-eventhub into Prometheus and displaying it within Grafana.

@TsuyoshiUshio
Copy link
Contributor

Thank you for sharing! It helps!

@TsuyoshiUshio
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants