Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance #7620

Merged
merged 12 commits into from
Nov 1, 2019

Conversation

ableegoldman
Copy link
Contributor

@ableegoldman ableegoldman commented Oct 31, 2019

Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance.

Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.

Should be cherry-picked to 2.4

@ableegoldman
Copy link
Contributor Author

@guozhangwang
Copy link
Contributor

Copy link
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix lgtm. Just one minor comment.

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

@Override
public void onLeavePrepare() {
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation() and use that object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to memberId() right? It seems like generation().memberId returns the same thing as memberId() -- or do you mean I should first check generation().hasMemberId() and then, if that is empty, do what -- just log nothing, or a separate message for when memberId is empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I'm referring to the generation() itself: we call this function twice within the function and in between the generation object may have been changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh right ok, I'll fix that

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

@Override
public void onLeavePrepare() {
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since generation may change by the hb in between this line and line 702 below, this entry may not have the exact number, maybe just read out from generation() and use that object?

@ableegoldman ableegoldman force-pushed the 8972-dont-abort-onAssignment branch from 41a707e to 821c4b5 Compare November 1, 2019 00:34
@omkreddy
Copy link
Contributor

omkreddy commented Nov 1, 2019

@guozhangwang @ableegoldman Is this ready to megre?

@abbccdda
Copy link
Contributor

abbccdda commented Nov 1, 2019

Will this fix any known flaky tests?

Copy link
Contributor

@abbccdda abbccdda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also unit test this change?

@@ -692,6 +692,8 @@ protected void onJoinPrepare(int generation, String memberId) {

@Override
public void onLeavePrepare() {
log.debug("Executing onLeavePrepare with generation {} and memberId {}", generation(), memberId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ableegoldman
Copy link
Contributor Author

Kicked off system test to confirm this fix: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3359/

will rebase + address comments so we can merge if (when) it's green

@@ -1109,7 +1109,6 @@ public void onAssignment(final Assignment assignment, final ConsumerGroupMetadat
// Check if this was a version probing rebalance and check the error code to trigger another rebalance if so
if (maybeUpdateSubscriptionVersion(receivedAssignmentMetadataVersion, latestCommonlySupportedVersion)) {
setAssignmentErrorCode(AssignorError.VERSION_PROBING.code());
return;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual fix: it needs to be applied to both the real assignor and the VP system test's assignor to get a green. We should consolidate these so we don't keep forgetting to add a fix to the test's custom assignor, and thinking it's snot actually fixed

try {
if (streamThread.setState(State.PARTITIONS_ASSIGNED) == null) {
log.debug(
"Skipping task creation in rebalance because we are already in {} state.",
streamThread.state()
);
} else if (streamThread.getAssignmentErrorCode() != AssignorError.NONE.code()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax I don't believe this is necessary to do the fix, but I think it's the right thing to do -- WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we would need to pause() all partitions on version probing, too, to avoid that poll() returns any data to the StreamThread -- we don't check the VP flag directly after poll() and want runOnce() to just be a no-op for the VP case. \cc @guozhangwang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We synced on this and confirmed this is not a problem, as we do pause the partitions in taskManager#createTasks -- however we're also pulling this part out into a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System test to confirm this piece is not necessary for the PR: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/3364/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to go back and forth, but actually I believe this part is absolutely necessary to do along with the other fix in this PR -- also we can just take out the check entirely, as we already test for the only other AssignorError at the top of this callback

@ableegoldman ableegoldman force-pushed the 8972-dont-abort-onAssignment branch from be291f2 to 71abf9d Compare November 1, 2019 21:06
@guozhangwang
Copy link
Contributor

Unit tests passed locally.

@guozhangwang guozhangwang merged commit d61b0c1 into apache:trunk Nov 1, 2019
@mjsax mjsax added the streams label Nov 1, 2019
guozhangwang pushed a commit that referenced this pull request Nov 1, 2019
…after rebalance (#7620)

Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance.

Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.

Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Contributor

LGTM. Merged to trunk and cherry-picked to 2.4.

ijuma added a commit to confluentinc/kafka that referenced this pull request Nov 3, 2019
Fixed a minor conflict in `.gitignore` and fix compiler
errors in KafkaUtilities due to `PartitionReplicaAssignment`
rename to `ReplicaAssignment`.

* apache-github/trunk: (34 commits)
  HOTFIX: Try to complete Send even if no bytes were written (apache#7622)
  KAFKA-9080: Revert the check added to validate non-compressed record batch does have continuous incremental offsets
  KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance (apache#7620)
  MINOR: Fix Kafka Streams JavaDocs with regard to new StreamJoined class (apache#7627)
  MINOR: Fix sensor retrieval in stand0by task's constructor (apache#7632)
  MINOR: Replace some Java 7 style code with Java 8 style (apache#7623)
  KAFKA-8868: Generate SubscriptionInfo protocol message (apache#7248)
  MINOR: Correctly mark offset expiry in GroupMetadataManager's OffsetExpired metric
  KAFKA-8972 (2.4 blocker): bug fix for restoring task (apache#7617)
  KAFKA-9093: NullPointerException in KafkaConsumer with group.instance.id (apache#7590)
  KAFKA-8980: Refactor state-store-level streams metrics (apache#7584)
  MINOR: Fix documentation for updateCurrentReassignment (apache#7611)
  MINOR: Preserve backwards-compatibility by renaming the AlterPartitionReassignment metric to PartitionReassignment
  KAFKA-8972 (2.4 blocker): clear all state for zombie task on TaskMigratedException (apache#7608)
  KAFKA-9077: Fix reading of metrics of Streams' SimpleBenchmark (apache#7610)
  KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (apache#7441)
  MINOR: improve logging of tasks on shutdown (apache#7597)
  KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (apache#7576)
  MINOR: Fix command examples in kafka-reassign-partitions.sh docs (apache#7583)
  KAFKA-9102; Increase default zk session timeout and replica max lag [KIP-537] (apache#7596)
  ...
KetkiT pushed a commit to KetkiT/kafka that referenced this pull request Nov 5, 2019
…ter rebalance (apache#7620)

Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance.

Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered.

Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>
ableegoldman added a commit to ableegoldman/kafka that referenced this pull request Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants