KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

ableegoldman · 2019-10-31T03:30:39Z

Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance.

Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.

Should be cherry-picked to 2.4

ableegoldman · 2019-10-31T03:36:35Z

@mjsax @abbccdda @guozhangwang @omkreddy

guozhangwang · 2019-10-31T04:16:58Z

https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3343/

guozhangwang

The fix lgtm. Just one minor comment.

guozhangwang · 2019-10-31T16:37:09Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

    @Override
    public void onLeavePrepare() {
+        log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());


Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation() and use that object?

You're referring to memberId() right? It seems like generation().memberId returns the same thing as memberId() -- or do you mean I should first check generation().hasMemberId() and then, if that is empty, do what -- just log nothing, or a separate message for when memberId is empty?

No I'm referring to the generation() itself: we call this function twice within the function and in between the generation object may have been changed.

Ohh right ok, I'll fix that

guozhangwang · 2019-10-31T16:38:17Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

    @Override
    public void onLeavePrepare() {
+        log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());


Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation() and use that object?

omkreddy · 2019-11-01T15:47:47Z

@guozhangwang @ableegoldman Is this ready to megre?

abbccdda · 2019-11-01T16:25:29Z

Will this fix any known flaky tests?

abbccdda

Could we also unit test this change?

abbccdda · 2019-11-01T16:25:55Z

clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

    @Override
    public void onLeavePrepare() {
+        log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());


ableegoldman · 2019-11-01T18:02:40Z

Kicked off system test to confirm this fix: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3359/

will rebase + address comments so we can merge if (when) it's green

ableegoldman · 2019-11-01T18:03:58Z

...ams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignor.java

@@ -1109,7 +1109,6 @@ public void onAssignment(final Assignment assignment, final ConsumerGroupMetadat
        // Check if this was a version probing rebalance and check the error code to trigger another rebalance if so
        if (maybeUpdateSubscriptionVersion(receivedAssignmentMetadataVersion, latestCommonlySupportedVersion)) {
            setAssignmentErrorCode(AssignorError.VERSION_PROBING.code());
-            return;


this is the actual fix: it needs to be applied to both the real assignor and the VP system test's assignor to get a green. We should consolidate these so we don't keep forgetting to add a fix to the test's custom assignor, and thinking it's snot actually fixed

ableegoldman · 2019-11-01T18:04:36Z

...ams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsRebalanceListener.java

        try {
            if (streamThread.setState(State.PARTITIONS_ASSIGNED) == null) {
                log.debug(
                    "Skipping task creation in rebalance because we are already in {} state.",
                    streamThread.state()
                );
-            } else if (streamThread.getAssignmentErrorCode() != AssignorError.NONE.code()) {


@mjsax I don't believe this is necessary to do the fix, but I think it's the right thing to do -- WDYT?

I actually think we would need to pause() all partitions on version probing, too, to avoid that poll() returns any data to the StreamThread -- we don't check the VP flag directly after poll() and want runOnce() to just be a no-op for the VP case. \cc @guozhangwang

We synced on this and confirmed this is not a problem, as we do pause the partitions in taskManager#createTasks -- however we're also pulling this part out into a separate PR

System test to confirm this piece is not necessary for the PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3364/

Sorry to go back and forth, but actually I believe this part is absolutely necessary to do along with the other fix in this PR -- also we can just take out the check entirely, as we already test for the only other AssignorError at the top of this callback

This reverts commit 1234fe6.

guozhangwang · 2019-11-01T23:09:43Z

Unit tests passed locally.

…after rebalance (#7620) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>

guozhangwang · 2019-11-01T23:18:56Z

LGTM. Merged to trunk and cherry-picked to 2.4.

Fixed a minor conflict in `.gitignore` and fix compiler errors in KafkaUtilities due to `PartitionReplicaAssignment` rename to `ReplicaAssignment`. * apache-github/trunk: (34 commits) HOTFIX: Try to complete Send even if no bytes were written (apache#7622) KAFKA-9080: Revert the check added to validate non-compressed record batch does have continuous incremental offsets KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance (apache#7620) MINOR: Fix Kafka Streams JavaDocs with regard to new StreamJoined class (apache#7627) MINOR: Fix sensor retrieval in stand0by task's constructor (apache#7632) MINOR: Replace some Java 7 style code with Java 8 style (apache#7623) KAFKA-8868: Generate SubscriptionInfo protocol message (apache#7248) MINOR: Correctly mark offset expiry in GroupMetadataManager's OffsetExpired metric KAFKA-8972 (2.4 blocker): bug fix for restoring task (apache#7617) KAFKA-9093: NullPointerException in KafkaConsumer with group.instance.id (apache#7590) KAFKA-8980: Refactor state-store-level streams metrics (apache#7584) MINOR: Fix documentation for updateCurrentReassignment (apache#7611) MINOR: Preserve backwards-compatibility by renaming the AlterPartitionReassignment metric to PartitionReassignment KAFKA-8972 (2.4 blocker): clear all state for zombie task on TaskMigratedException (apache#7608) KAFKA-9077: Fix reading of metrics of Streams' SimpleBenchmark (apache#7610) KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (apache#7441) MINOR: improve logging of tasks on shutdown (apache#7597) KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (apache#7576) MINOR: Fix command examples in kafka-reassign-partitions.sh docs (apache#7583) KAFKA-9102; Increase default zk session timeout and replica max lag [KIP-537] (apache#7596) ...

…ter rebalance (apache#7620) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>

…updated after rebalance (apache#7620)" This reverts commit c46dded.

guozhangwang reviewed Oct 31, 2019

View reviewed changes

ableegoldman force-pushed the 8972-dont-abort-onAssignment branch from 41a707e to 821c4b5 Compare November 1, 2019 00:34

abbccdda reviewed Nov 1, 2019

View reviewed changes

ableegoldman commented Nov 1, 2019

View reviewed changes

ableegoldman added 9 commits November 1, 2019 14:05

just dont return on VP spotted

e714b11

symmetrize logging

aec2eb4

log

1a2b370

fix typo in logging

d5cde5f

logging moved to other PR

6680829

actual logginf fix

b81f174

make VP normal rebalance

1234fe6

logging improvement

7b1865a

apply fix to upgrade test assignor

71abf9d

ableegoldman force-pushed the 8972-dont-abort-onAssignment branch from be291f2 to 71abf9d Compare November 1, 2019 21:06

ableegoldman added 3 commits November 1, 2019 14:07

Revert "make VP normal rebalance"

0e2ccce

This reverts commit 1234fe6.

can just remove completely

24e33b1

generation fix

15a6923

guozhangwang merged commit d61b0c1 into apache:trunk Nov 1, 2019

mjsax added the streams label Nov 1, 2019

ableegoldman added a commit to ableegoldman/kafka that referenced this pull request Nov 13, 2019

Revert "KAFKA-8972 (2.4 blocker): TaskManager state should always be …

8d4e9a8

…updated after rebalance (apache#7620)" This reverts commit c46dded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

ableegoldman commented Oct 31, 2019 •

edited

Loading

ableegoldman commented Oct 31, 2019

guozhangwang commented Oct 31, 2019

guozhangwang left a comment

guozhangwang Oct 31, 2019

abbccdda Nov 1, 2019

ableegoldman Nov 1, 2019

guozhangwang Nov 1, 2019

ableegoldman Nov 1, 2019

guozhangwang Oct 31, 2019

omkreddy commented Nov 1, 2019

abbccdda commented Nov 1, 2019

abbccdda left a comment

abbccdda Nov 1, 2019

ableegoldman commented Nov 1, 2019

ableegoldman Nov 1, 2019

ableegoldman Nov 1, 2019

mjsax Nov 1, 2019

ableegoldman Nov 1, 2019

ableegoldman Nov 1, 2019

ableegoldman Nov 1, 2019

guozhangwang commented Nov 1, 2019

guozhangwang commented Nov 1, 2019

KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

Conversation

ableegoldman commented Oct 31, 2019 • edited Loading

ableegoldman commented Oct 31, 2019

guozhangwang commented Oct 31, 2019

guozhangwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

omkreddy commented Nov 1, 2019

abbccdda commented Nov 1, 2019

abbccdda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ableegoldman commented Nov 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guozhangwang commented Nov 1, 2019

guozhangwang commented Nov 1, 2019

ableegoldman commented Oct 31, 2019 •

edited

Loading