Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use group assignment to fetch consumer offset instead of fetch all partitions #220

Conversation

iamgd67
Copy link
Contributor

@iamgd67 iamgd67 commented Apr 28, 2021

use group assignment to fetch consumer offset instead of fetch all partitions
to reduce request size & server cordinator log (by default, server will log all partition offset fetch request details)

@danielqsj danielqsj merged commit 9514777 into danielqsj:master May 18, 2021
@danielqsj
Copy link
Owner

thanks ! @iamgd67

@scholzj
Copy link

scholzj commented Jun 17, 2021

@iamgd67 @danielqsj It looks to me like this change caused that the Kafka Exporter now shows only lag for connected consumer groups. But when for example your client (all instances) dies and isn't connected for some time, it drops it and doesn't show the lag anymore. I assume it indeed reduces the request size as described. But it also reduces the functionality and impacts how useful the lag metrics are now. Was this change of behaviour intentional?

@iamgd67
Copy link
Contributor Author

iamgd67 commented Jun 18, 2021

@scholzj
maybe it's better to make fetch all/assigned partitions an option.
on the other way, normally these metrics will be grabbed and saved by time serial data base like Prometheus, and history data may access there.

@scholzj
Copy link

scholzj commented Jun 18, 2021

on the other way, normally these metrics will be grabbed and saved by time serial data base like Prometheus, and history data may access there.

That is true, the last update will be stored normally. But when your app stops working, it becomes quickly outdated

@Kaali09
Copy link

Kaali09 commented Aug 10, 2021

Agree with @scholzj. When the consumer stuck with one partition and lag is getting increased then we will not come to know the same.
Also, when the consumer dead because of some error/issue, the lag keep on increase which will impact the pipeline.

Hence It is important to monitor the partition wise lag as well.

Request you enable the flag to get the partition level metrics and to get the consumer group lag metrics when the consumer is dead/not active state.

@iamgd67
Copy link
Contributor Author

iamgd67 commented Sep 2, 2021

@scholzj @Kaali09 thanks for reporting the issue of not showing dead group's offset, I saw other people submit issues about this too, so create a pr(#255) to better handle dead/self-managed group's offset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants