-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-3911: Data inconsistency caused by DIFF sync uncommitted log #1445
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you work creating this reproducer.
I see there are a lot of "Thread.sleep", this could lead to flakes, btw I can't suggest a better way.
@lvfangmin PTAL
zookeeper-server/src/test/java/org/apache/zookeeper/server/quorum/DIFFSyncConsistencyTest.java
Outdated
Show resolved
Hide resolved
I think it's fine here as the execution does not depends on timing of the sleep. The functionality these code block is basically the same with the waitForAll/One helpers. |
One more observation - I think we have the same issue even before ZOOKEEPER-2678 (where it avoids taking snapshot before sending NEWLEADER ACK back during DIFF sync), because even when the snapshot was taken, the data tree at that time still does not have the proposals DIFF sync from leader applied (the commit of all such proposals happen after follower receive UPTODATE). Specifically, in zab paper, it said: @breed might have more context around this. |
jenkins pre-merge reports one failed test: FileTxnSnapLogMetricsTest#testFileTxnSnapLogMetrics |
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
Show resolved
Hide resolved
retest maven build |
requesting more reviews; @lvfangmin @breed |
Committers - can i get more reviews of this? I'd like to get this merged if no other issues. I hope the test case itself is straightforward to illustrate the problem and the fix itself is fairly simple (test case pass with the fix and fail w/o the fix). More discussions can be found at the ticket https://issues.apache.org/jira/browse/ZOOKEEPER-3911. |
sock.setSoTimeout(self.tickTime * self.syncLimit); | ||
self.setSyncMode(QuorumPeer.SyncMode.NONE); | ||
zk.startup(); | ||
zk.startServing(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question:
this "startServing" works if you executed 'zk.startupWithoutServing();' in your new code.
but in other code paths we are not calling "startupWithoutServing"
so when we come to this point zk.startServing() will only change the state to RUNNING, and we had not called 'startupWithServerState'
can you please explain ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in other code paths we are not calling "startupWithoutServing"
The invariant here is that a call to zk.startServing
will always follow a call to zk.startupWithoutServing
. This is because the ZAB protocol implementation requires this order of invocation (in syncWithLeader
). In other words, there is no other code paths that involve both calls. To finish the sync, the leader and follower must follow these events in order.
- First, learner will receive
NEWLEADER
from leader. This is where we will callzk.startupWithoutServing
. - Then, learner will receive
UPTODATE
from leader. This is where we breaks the outer sync loop, afterwards we callzk.startServing
.
Hope this make sense. A more detailed description can be found on phase 2 "Sync with followers" on https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other code path the server state is already in running, so this has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now thinking about this... it's a bit surprising we never needed this control before, to have a running server "active" or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, PTAL
Since this has 2 committer's +1 I'm keen on merging this. I'd say to master, 3.6 and even 3.5, hopefully cherry picks clean. |
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes #1445 from hanm/ZOOKEEPER-3911 (cherry picked from commit b978dfb) Signed-off-by: Norbert Kalmar <nkalmar@apache.org>
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes #1445 from hanm/ZOOKEEPER-3911 (cherry picked from commit b978dfb) Signed-off-by: Norbert Kalmar <nkalmar@apache.org>
Merged to master, branch-3.6 and branch-3.5. Thanks @hanm |
Thanks @nkalmar for taking care of merging this. I'll check out what's the problem with the patch on branch-3.5. |
I'm a bit late in reviewing this, but FWIW: LGTM. I started looking into the issue with 3.5, but haven't had a chance to figure out the inconsistency. In case it helps, here is my current tree: https://github.com/ztzg/zookeeper/tree/ZOOKEEPER-3911-diff-sync-3.5 |
@hanm @nkalmar I am getting the following compilation error in branch-3.6. The error started to occur after this commit. I wonder if you ran into the same issue. Can you please take a look at it? ==================== |
@li4wang thanks for reporting. I think this is because 3.6 had a different junit verison / api usage across unit tests and we didn't backup that commit to 3.6 when upgrades the tests. It should be easier to fix. Part of the reason this was not caught during commit was because this was a cherry pick so it didn't go through the CI pipeline - also I suspect we lose our apache builds for 3.6 branches (at least my inbox was empty - otherwise the build fail notification would raise an alarm). |
@pkuwm yes, I am fixing it. I am planning to back porting the following commits from master branch to branch-3.6 via cherry-picking. 8da9c72 ZOOKEEPER-3850: Update jUnit to 5.6 in pom.xml files |
@li4wang ZOOKEEPER-3850 won't solve the issue, since it only replaces jUnit4 with jUnit5-vintage (which is just a jUnit 4.13 with a rename). I would advise to fix the testcases in branch 3.6 with the use of jUnit4. |
@tamaashu yeah, I noticed ZOOKEEPER-3850 by itself is not sufficient. I fixed the compilation errors by applying ZOOKEEPER-3850 and some addition changes on pom.xml, but the test case failed too. @hanm can you please fix the test failure along with the compilation error? I do agree with @tamaashu, it would be better just fix the testcases in branch 3.6 by not using JUnit5 API. test case error[INFO] Running org.apache.zookeeper.server.quorum.DIFFSyncConsistencyTest |
@hanm @li4wang @tamaashu Yep, I also tried but since there were commits during those jUnit 5 upgrade, there are conflicts if we cherry-pick those upgrade commits. It's easier to fix it directly if we don't have more backports to branch 3.6. |
…sistencyTest ### Description Resolves [ZOOKEEPER-4011](https://issues.apache.org/jira/browse/ZOOKEEPER-4011) maven build fails on branch-3.6 because a back ported commit from the branch master uses jUnit 5. ### Solution As discussed in #1445, since cherry-picking jUnit 5 upgrades has conflicts, and there won't be too many back ports to 3.6 afterwards, we can just fix the issue by using jUnit 4 in the test. Author: Huizhi Lu <hulu@linkedin.com> Reviewers: hanm, tamaashu, maoling Closes #1545 from pkuwm/fix-build-branch-3.6
@ztzg ZOOKEEPER-3911 requires ZOOKEEPER-3240 to pass the unit test. I've created a new PR to backport ZOOKEEPER-3911 to branch-3.5.9. |
@fanyang89 Thanks for taking care of this. Please open PR against |
Hi @fanyang89,
Sorry for the lag, and thank you very much for investigating/clarifying this!
Thank you—but as pointed out by @anmolnar in #1655 (comment), (I will hopefully have a bit more time to look into these topics in the coming weeks.) |
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes apache#1445 from hanm/ZOOKEEPER-3911
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes apache#1445 from hanm/ZOOKEEPER-3911
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes apache#1445 from hanm/ZOOKEEPER-3911
Problem: During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers. Solution: Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery. Author: Michael Han <hanm@apache.org> Reviewers: Enrico Olivelli <eolivelli@apache.org>, Norbert Kalmar <nkalmar@apache.org> Closes apache#1445 from hanm/ZOOKEEPER-3911
Problem:
During DIFF sync, followers will persist the proposals from leader only after receiving the UPTODATE message. Leader will start serve client requests once receive a quorum of NEWLEADER ACK from followers, and NEWLEADER ACKs are sent from followers before receiving UPTODATE. So it's possible that when leader starting serving, followers might not persistent all of the DIFF sync proposals from leader. If at a certain point in time (e.g. once receiving UPTODATE from leader) all quorum servers crash, then these proposals, where leader they are quorum committed, will lost if the old leader can't come back and the followers form a new quorum. The end result is inconsistent view of the data from client before / after quorum recovers.
Solution:
Make sure follower to persist all DIFF proposals before they ACK NEWLEADER. When leader receives a quorum of NEWLEADER ACK, it's guaranteed that all DIFF sync proposals are quorum persistent and can survive after the recovery.