Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-3911: Data inconsistency caused by DIFF sync uncommitted log #1445

Closed
wants to merge 1 commit into from

Conversation

hanm
Copy link
Contributor

@hanm hanm commented Sep 2, 2020

Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you work creating this reproducer.

I see there are a lot of "Thread.sleep", this could lead to flakes, btw I can't suggest a better way.

@lvfangmin PTAL

@hanm
Copy link
Contributor Author

hanm commented Sep 3, 2020

I see there are a lot of "Thread.sleep", this could lead to flakes

I think it's fine here as the execution does not depends on timing of the sleep. The functionality these code block is basically the same with the waitForAll/One helpers.

@hanm
Copy link
Contributor Author

hanm commented Sep 3, 2020

One more observation - I think we have the same issue even before ZOOKEEPER-2678 (where it avoids taking snapshot before sending NEWLEADER ACK back during DIFF sync), because even when the snapshot was taken, the data tree at that time still does not have the proposals DIFF sync from leader applied (the commit of all such proposals happen after follower receive UPTODATE).

Specifically, in zab paper, it said:
"Phase 2: Sync with followers:
f The follower syncs with the leader, but doesn't modify its state until it receives the NEWLEADER(e) packet. Once it receives NEWLEADER(e) it atomically applies the new state and sets f.currentEpoch = e. It then sends ACK(e << 32)."
But in our implementation, I don't see we actually "atomically applies the new state" before sends ACK.

@breed might have more context around this.

@hanm
Copy link
Contributor Author

hanm commented Sep 3, 2020

jenkins pre-merge reports one failed test: FileTxnSnapLogMetricsTest#testFileTxnSnapLogMetrics
however, i can't reproduce it locally.

@hanm
Copy link
Contributor Author

hanm commented Sep 13, 2020

retest maven build

@hanm hanm closed this Sep 13, 2020
@hanm hanm reopened this Sep 13, 2020
@hanm hanm closed this Sep 15, 2020
@hanm hanm reopened this Sep 15, 2020
@hanm
Copy link
Contributor Author

hanm commented Sep 20, 2020

requesting more reviews; @lvfangmin @breed

@hanm
Copy link
Contributor Author

hanm commented Oct 27, 2020

Committers - can i get more reviews of this? I'd like to get this merged if no other issues. I hope the test case itself is straightforward to illustrate the problem and the fix itself is fairly simple (test case pass with the fix and fail w/o the fix). More discussions can be found at the ticket https://issues.apache.org/jira/browse/ZOOKEEPER-3911.

cc @eolivelli @symat @anmolnar @nkalmar @arshadmohammad

sock.setSoTimeout(self.tickTime * self.syncLimit);
self.setSyncMode(QuorumPeer.SyncMode.NONE);
zk.startup();
zk.startServing();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question:

this "startServing" works if you executed 'zk.startupWithoutServing();' in your new code.
but in other code paths we are not calling "startupWithoutServing"
so when we come to this point zk.startServing() will only change the state to RUNNING, and we had not called 'startupWithServerState'

can you please explain ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in other code paths we are not calling "startupWithoutServing"

The invariant here is that a call to zk.startServing will always follow a call to zk.startupWithoutServing. This is because the ZAB protocol implementation requires this order of invocation (in syncWithLeader). In other words, there is no other code paths that involve both calls. To finish the sync, the leader and follower must follow these events in order.

  • First, learner will receive NEWLEADER from leader. This is where we will call zk.startupWithoutServing.
  • Then, learner will receive UPTODATE from leader. This is where we breaks the outer sync loop, afterwards we call zk.startServing.

Hope this make sense. A more detailed description can be found on phase 2 "Sync with followers" on https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other code path the server state is already in running, so this has no effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking about this... it's a bit surprising we never needed this control before, to have a running server "active" or not.

Copy link
Contributor

@nkalmar nkalmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, PTAL

@nkalmar
Copy link
Contributor

nkalmar commented Nov 9, 2020

Since this has 2 committer's +1 I'm keen on merging this. I'd say to master, 3.6 and even 3.5, hopefully cherry picks clean.
I will wait another day if any reactions comes, and then I'll merge (or ofc others can merge before that).

@asfgit asfgit closed this in b978dfb Nov 12, 2020
asfgit pushed a commit that referenced this pull request Nov 12, 2020
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes #1445 from hanm/ZOOKEEPER-3911

(cherry picked from commit b978dfb)
Signed-off-by: Norbert Kalmar <nkalmar@apache.org>
asfgit pushed a commit that referenced this pull request Nov 12, 2020
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes #1445 from hanm/ZOOKEEPER-3911

(cherry picked from commit b978dfb)
Signed-off-by: Norbert Kalmar <nkalmar@apache.org>
@nkalmar
Copy link
Contributor

nkalmar commented Nov 12, 2020

Merged to master, branch-3.6 and branch-3.5. Thanks @hanm
3.5 did not pick clean, I fixed up manually, I will do some testing.

@hanm
Copy link
Contributor Author

hanm commented Nov 14, 2020

Thanks @nkalmar for taking care of merging this. I'll check out what's the problem with the patch on branch-3.5.

@ztzg
Copy link
Contributor

ztzg commented Nov 14, 2020

I'm a bit late in reviewing this, but FWIW: LGTM.

I started looking into the issue with 3.5, but haven't had a chance to figure out the inconsistency. In case it helps, here is my current tree:

https://github.com/ztzg/zookeeper/tree/ZOOKEEPER-3911-diff-sync-3.5

@li4wang
Copy link
Contributor

li4wang commented Nov 19, 2020

@hanm @nkalmar I am getting the following compilation error in branch-3.6. The error started to occur after this commit. I wonder if you ran into the same issue. Can you please take a look at it?

====================
[INFO] Finished at: 2020-11-19T15:07:24-08:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:testCompile (default-testCompile) on project zookeeper: Compilation failure: Compilation failure:
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[22,36] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[22,1] static import only from classes and interfaces
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[23,36] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[23,1] static import only from classes and interfaces
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[24,36] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[24,1] static import only from classes and interfaces
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[25,36] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[25,1] static import only from classes and interfaces
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[42,29] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[43,29] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[44,29] package org.junit.jupiter.api does not exist
[ERROR] /Users/liwang/apache/zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java:[51,6] cannot find symbol
[ERROR] symbol: class Test

@hanm
Copy link
Contributor Author

hanm commented Nov 20, 2020

@li4wang thanks for reporting. I think this is because 3.6 had a different junit verison / api usage across unit tests and we didn't backup that commit to 3.6 when upgrades the tests. It should be easier to fix.

Part of the reason this was not caught during commit was because this was a cherry pick so it didn't go through the CI pipeline - also I suspect we lose our apache builds for 3.6 branches (at least my inbox was empty - otherwise the build fail notification would raise an alarm).

@huizhilu
Copy link
Contributor

@hanm @li4wang I also noticed this build failure. Are you guys fixing it? If not, I think I could submit a PR for a quick fix by back porting the junit upgrade from the master branch to branch-3.6.

@li4wang
Copy link
Contributor

li4wang commented Nov 20, 2020

@pkuwm yes, I am fixing it. I am planning to back porting the following commits from master branch to branch-3.6 via cherry-picking.

8da9c72 ZOOKEEPER-3850: Update jUnit to 5.6 in pom.xml files

@tamaashu
Copy link
Contributor

@li4wang ZOOKEEPER-3850 won't solve the issue, since it only replaces jUnit4 with jUnit5-vintage (which is just a jUnit 4.13 with a rename). I would advise to fix the testcases in branch 3.6 with the use of jUnit4.
ZooKeeper 3.7 is coming with fully supported jUnit5 testing, there won't be too many backports to 3.6 afterwards.

@li4wang
Copy link
Contributor

li4wang commented Nov 20, 2020

@tamaashu yeah, I noticed ZOOKEEPER-3850 by itself is not sufficient. I fixed the compilation errors by applying ZOOKEEPER-3850 and some addition changes on pom.xml, but the test case failed too.

@hanm can you please fix the test failure along with the compilation error? I do agree with @tamaashu, it would be better just fix the testcases in branch 3.6 by not using JUnit5 API.

test case error

[INFO] Running org.apache.zookeeper.server.quorum.DIFFSyncConsistencyTest
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 30.442 s <<< FAILURE! - in org.apache.zookeeper.server.quorum.DIFFSyncConsistencyTest
[ERROR] testInconsistentDueToUncommittedLog Time elapsed: 30.326 s <<< FAILURE!
org.opentest4j.AssertionFailedError: waiting for server 0 being up ==> expected: but was:
at org.apache.zookeeper.server.quorum.DIFFSyncConsistencyTest.testInconsistentDueToUncommittedLog(DIFFSyncConsistencyTest.java:78)

@huizhilu
Copy link
Contributor

huizhilu commented Nov 20, 2020

@hanm @li4wang @tamaashu Yep, I also tried but since there were commits during those jUnit 5 upgrade, there are conflicts if we cherry-pick those upgrade commits. It's easier to fix it directly if we don't have more backports to branch 3.6.
I have the working code locally. @hanm if you don't have time to fix this minor issue, I could submit a PR :)
A ticket is created here: https://issues.apache.org/jira/browse/ZOOKEEPER-4011

asfgit pushed a commit that referenced this pull request Nov 22, 2020
…sistencyTest

### Description
Resolves [ZOOKEEPER-4011](https://issues.apache.org/jira/browse/ZOOKEEPER-4011)

maven build fails on branch-3.6 because a back ported commit from the branch master uses jUnit 5.

### Solution
As discussed in #1445, since cherry-picking jUnit 5 upgrades has conflicts, and there won't be too many back ports to 3.6 afterwards, we can just fix the issue by using jUnit 4 in the test.

Author: Huizhi Lu <hulu@linkedin.com>

Reviewers: hanm, tamaashu, maoling

Closes #1545 from pkuwm/fix-build-branch-3.6
@fanyang89
Copy link
Contributor

I started looking into the issue with 3.5, but haven't had a chance to figure out the inconsistency. In case it helps, here is my current tree:

@ztzg ZOOKEEPER-3911 requires ZOOKEEPER-3240 to pass the unit test. I've created a new PR to backport ZOOKEEPER-3911 to branch-3.5.9.

#1655

@anmolnar
Copy link
Contributor

@fanyang89 Thanks for taking care of this. Please open PR against branch-3.5.

@ztzg
Copy link
Contributor

ztzg commented Apr 21, 2021

Hi @fanyang89,

I started looking into the issue with 3.5, but haven't had a chance to figure out the inconsistency. In case it helps, here is my current tree:

@ztzg ZOOKEEPER-3911 requires ZOOKEEPER-3240 to pass the unit test.

Sorry for the lag, and thank you very much for investigating/clarifying this!

I've created a new PR to backport ZOOKEEPER-3911 to branch-3.5.9.

#1655

Thank you—but as pointed out by @anmolnar in #1655 (comment), branch-3.5.9 is a "release branch," and 3.5.9 has already been released. Could you please submit the same PR against branch-3.5 (the "series" branch)?

(I will hopefully have a bit more time to look into these topics in the coming weeks.)

@fanyang89
Copy link
Contributor

Hi @ztzg, I've created a new PR against branch-3.5 #1657, but I forget to update the link at this thread. Sorry for that.

@ztzg
Copy link
Contributor

ztzg commented Apr 21, 2021

Hi @ztzg, I've created a new PR against branch-3.5 #1657, but I forget to update the link at this thread. Sorry for that.

No problem. Now merged in branch branch-3.5. Thank you!

RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes apache#1445 from hanm/ZOOKEEPER-3911
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes apache#1445 from hanm/ZOOKEEPER-3911
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Aug 31, 2022
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes apache#1445 from hanm/ZOOKEEPER-3911
RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
Problem:

During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.

Solution:

Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.

Author: Michael Han <hanm@apache.org>

Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org>

Closes apache#1445 from hanm/ZOOKEEPER-3911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants