Upgrade to Confluent.Kafka 1.6.0 #193

amotl · 2020-11-16T18:54:09Z

Dear @TsuyoshiUshio, @fbeltrao and @ryancrawcour,

thanks again for upgrading to Confluent.Kafka 1.5.2 (#185). Nevertheless, we still saw some partitions occasionally stalling on the consumer side, even after applying all of the mitigations outlined at confluentinc/librdkafka#3109.

However, after just moving on to v1.6.0-PRE3, things appear to be really smooth now, see also confluentinc/librdkafka#3109 (comment).

I just wanted to give you a heads up on this and will notify you as soon as 1.6.0 GA is out.

With kind regards,
Andreas.

TsuyoshiUshio · 2020-11-17T19:38:10Z

Hi @amotl
Thank you, once it is released, I'll update it.

TsuyoshiUshio · 2020-11-19T23:56:07Z

Hi @amotl
Can I ask you a question?

How can we reproduce the issue? EventHubs will cause the issue?
How exactly the log from libkafka we are going to see?
Some of users look hit this issue, however, I can't have a confidence.

amotl · 2020-11-20T08:27:11Z

Dear Tsuyoshi,

thanks for asking. I want to apologize that I can't really provide any more detailed insights. We have neither a repro nor any logs available and just upgraded to the most recent version of librdkafka in a trial-and-error manner. What I can tell you is about the symptoms we experienced and more details about our environment.

We are using confluent-kafka-python to connect to Azure Event Hubs from a) a self-provisioned Kubernetes cluster on Azure and b) (only recently) from a managed AKS cluster.

We are using the maximum of 32 partitions available on Azure Event Hubs and consume from them by sharding the load to 32 consumer instances, each one running on a different pod.

Occasionally, we saw specific partitions to be stalled from consumption. It looks like it has always been the same, like 5 and 21 - so at least there have been some deterministic aspects around this issue ;].

We determined the issue by a) seeing the specific consumer instances "hanging" (i.e. no more log output) and b) by monitoring the partition offset time metric for each partition.

With kind regards,
Andreas.

P.S.: You will recognize that our environment didn't either use Confluent.Kafka nor was connecting to a vanilla Kafka broker. Instead, we used the Python client (also based on librdkafka) and have been connecting to Azure Event Hubs. However, we are about to use azure-functions-kafka-extension within a rewrite of our infrastructure and thought it would be a good idea to give you a heads up on this in order to prepare everything accordingly that we won't trip into the same issues. Hence, I came here to share our observations with you and suggest to a) bump to the most recent versions of librdkafka and b) unlock the configuration options socket.keepalive.enable and metadata.max.age.ms (#187) in order to be able to adjust to the recommended settings by Microsoft when connecting to Event Hubs.

P.P.S.: monitoring the partition offset time metric for each partition - that is: Putting event.enqueued_time and partition_context.last_enqueued_event_properties["offset"] from def on_event(self, partition_context: PartitionContext, event: EventData) coming from azure-eventhub into Prometheus and displaying it within Grafana.

TsuyoshiUshio · 2020-11-20T16:33:06Z

Thank you for sharing! It helps!

TsuyoshiUshio · 2021-03-27T04:49:36Z

Done the upgrading. https://github.com/Azure/azure-functions-kafka-extension/releases/tag/3.3.1

ghost added the Needs: triage (functions) label Nov 16, 2020

TsuyoshiUshio added enhancement New feature or request and removed Needs: triage (functions) labels Nov 17, 2020

TsuyoshiUshio mentioned this issue Nov 20, 2020

GroupCoordinator: *.*.*.*:9092: 1 request(s) timed out: disconnect with Azure Backed Service #197

Open

TsuyoshiUshio closed this as completed Mar 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to Confluent.Kafka 1.6.0 #193

Upgrade to Confluent.Kafka 1.6.0 #193

amotl commented Nov 16, 2020

TsuyoshiUshio commented Nov 17, 2020

TsuyoshiUshio commented Nov 19, 2020

amotl commented Nov 20, 2020

TsuyoshiUshio commented Nov 20, 2020

TsuyoshiUshio commented Mar 27, 2021

Upgrade to Confluent.Kafka 1.6.0 #193

Upgrade to Confluent.Kafka 1.6.0 #193

Comments

amotl commented Nov 16, 2020

TsuyoshiUshio commented Nov 17, 2020

TsuyoshiUshio commented Nov 19, 2020

amotl commented Nov 20, 2020

TsuyoshiUshio commented Nov 20, 2020

TsuyoshiUshio commented Mar 27, 2021