-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka: Consumer group stale static member properties on rejoin #23693
kafka: Consumer group stale static member properties on rejoin #23693
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
r.client_id, | ||
r.client_host, | ||
r.data.session_timeout_ms, | ||
r.data.rebalance_timeout_ms); | ||
auto old_protocols = _members.at(new_member_id)->protocols().copy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an understanding of why old_protocols
are a backup of the new protocols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any reason for it to make sense to store the "old" protocols after they have been updated. I believe it is a bug. However, as I couldn't test it, I preferred to not touch it as part of this change and see what we can do about it as a separate thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bug. It's present in AK as well, although this one seems much trickier to observe: https://github.com/apache/kafka/blame/bb6ebd83f9a0f9b2a37704f1f37b78a48cf480e1/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L1361
Up to now, group::update_member function would only update the protocols of the group_member. The fields updated now include the client id and host fields and the session and rebalance timeout durations.
5ade915
to
6defd55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice first PR
r.client_id, | ||
r.client_host, | ||
r.data.session_timeout_ms, | ||
r.data.rebalance_timeout_ms); | ||
auto old_protocols = _members.at(new_member_id)->protocols().copy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bug. It's present in AK as well, although this one seems much trickier to observe: https://github.com/apache/kafka/blame/bb6ebd83f9a0f9b2a37704f1f37b78a48cf480e1/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L1361
6defd55
to
72365bb
Compare
the below tests from https://buildkite.com/redpanda/redpanda/builds/56148#01927240-2408-45bb-b566-8b87d7d87867 have failed and will be retried
|
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56148#01927298-45f6-4ef0-b290-f69f4709cfd8:
|
Retry command for Build#56148please wait until all jobs are finished before running the slash command
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56148#0192726e-c5ba-47de-89f8-10170738a0be |
CI Failure: |
/backport v24.2.x |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
Failed to create a backport PR to v23.3.x branch. I tried:
|
Failed to create a backport PR to v24.2.x branch. I tried:
|
Fixes: CORE-7749
When a static group member rejoins, only its protocols were updated based on the rejoin request.
The relevant updateMember functions have been extended to also include the client-id, client-host, session-timeout and rebalance-timeout properties.
Backports Required
Release Notes
Bug Fixes