-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620
KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix lgtm. Just one minor comment.
@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) { | |||
|
|||
@Override | |||
public void onLeavePrepare() { | |||
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation()
and use that object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're referring to memberId()
right? It seems like generation().memberId
returns the same thing as memberId()
-- or do you mean I should first check generation().hasMemberId()
and then, if that is empty, do what -- just log nothing, or a separate message for when memberId is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I'm referring to the generation()
itself: we call this function twice within the function and in between the generation object may have been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh right ok, I'll fix that
@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) { | |||
|
|||
@Override | |||
public void onLeavePrepare() { | |||
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation()
and use that object?
41a707e
to
821c4b5
Compare
@guozhangwang @ableegoldman Is this ready to megre? |
Will this fix any known flaky tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also unit test this change?
@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) { | |||
|
|||
@Override | |||
public void onLeavePrepare() { | |||
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Kicked off system test to confirm this fix: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3359/ will rebase + address comments so we can merge if (when) it's green |
@@ -1109,7 +1109,6 @@ public void onAssignment(final Assignment assignment, final ConsumerGroupMetadat | |||
// Check if this was a version probing rebalance and check the error code to trigger another rebalance if so | |||
if (maybeUpdateSubscriptionVersion(receivedAssignmentMetadataVersion, latestCommonlySupportedVersion)) { | |||
setAssignmentErrorCode(AssignorError.VERSION_PROBING.code()); | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual fix: it needs to be applied to both the real assignor and the VP system test's assignor to get a green. We should consolidate these so we don't keep forgetting to add a fix to the test's custom assignor, and thinking it's snot actually fixed
try { | ||
if (streamThread.setState(State.PARTITIONS_ASSIGNED) == null) { | ||
log.debug( | ||
"Skipping task creation in rebalance because we are already in {} state.", | ||
streamThread.state() | ||
); | ||
} else if (streamThread.getAssignmentErrorCode() != AssignorError.NONE.code()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjsax I don't believe this is necessary to do the fix, but I think it's the right thing to do -- WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we would need to pause()
all partitions on version probing, too, to avoid that poll()
returns any data to the StreamThread
-- we don't check the VP flag directly after poll()
and want runOnce()
to just be a no-op for the VP case. \cc @guozhangwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We synced on this and confirmed this is not a problem, as we do pause the partitions in taskManager#createTasks
-- however we're also pulling this part out into a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System test to confirm this piece is not necessary for the PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3364/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to go back and forth, but actually I believe this part is absolutely necessary to do along with the other fix in this PR -- also we can just take out the check entirely, as we already test for the only other AssignorError
at the top of this callback
be291f2
to
71abf9d
Compare
This reverts commit 1234fe6.
Unit tests passed locally. |
…after rebalance (#7620) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
LGTM. Merged to trunk and cherry-picked to 2.4. |
Fixed a minor conflict in `.gitignore` and fix compiler errors in KafkaUtilities due to `PartitionReplicaAssignment` rename to `ReplicaAssignment`. * apache-github/trunk: (34 commits) HOTFIX: Try to complete Send even if no bytes were written (apache#7622) KAFKA-9080: Revert the check added to validate non-compressed record batch does have continuous incremental offsets KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance (apache#7620) MINOR: Fix Kafka Streams JavaDocs with regard to new StreamJoined class (apache#7627) MINOR: Fix sensor retrieval in stand0by task's constructor (apache#7632) MINOR: Replace some Java 7 style code with Java 8 style (apache#7623) KAFKA-8868: Generate SubscriptionInfo protocol message (apache#7248) MINOR: Correctly mark offset expiry in GroupMetadataManager's OffsetExpired metric KAFKA-8972 (2.4 blocker): bug fix for restoring task (apache#7617) KAFKA-9093: NullPointerException in KafkaConsumer with group.instance.id (apache#7590) KAFKA-8980: Refactor state-store-level streams metrics (apache#7584) MINOR: Fix documentation for updateCurrentReassignment (apache#7611) MINOR: Preserve backwards-compatibility by renaming the AlterPartitionReassignment metric to PartitionReassignment KAFKA-8972 (2.4 blocker): clear all state for zombie task on TaskMigratedException (apache#7608) KAFKA-9077: Fix reading of metrics of Streams' SimpleBenchmark (apache#7610) KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (apache#7441) MINOR: improve logging of tasks on shutdown (apache#7597) KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (apache#7576) MINOR: Fix command examples in kafka-reassign-partitions.sh docs (apache#7583) KAFKA-9102; Increase default zk session timeout and replica max lag [KIP-537] (apache#7596) ...
…ter rebalance (apache#7620) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
…updated after rebalance (apache#7620)" This reverts commit c46dded.
Currently when we identify version probing we return early from
onAssignment
and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up inonPartitionsRevoked
which gets invoked when we callonLeavePrepare
as part of triggering the follow-up rebalance.Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow
onAssignment
to proceed as usual so that theTaskManager
is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.Should be cherry-picked to 2.4