Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4785: Txn loss due to race condition in Learner.syncWithLeader() during DIFF sync #2111

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

li4wang
Copy link
Contributor

@li4wang li4wang commented Jan 23, 2024

Provides a fix for the txn loss issue caused by the race condition in DIFF sync.

Author: Li Wang liwang@apple.com

…der() during DIFF sync

Author: Li Wang <liwang@apple.com>
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nice catch !

@li4wang li4wang merged commit 315abde into apache:master Jan 26, 2024
12 of 13 checks passed
li4wang added a commit to li4wang/zookeeper that referenced this pull request Feb 12, 2024
…der() during DIFF sync (apache#2111)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
li4wang added a commit to li4wang/zookeeper that referenced this pull request Feb 12, 2024
…der() during DIFF sync (apache#2111)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
li4wang added a commit to li4wang/zookeeper that referenced this pull request Feb 12, 2024
…der() during DIFF sync (apache#2111)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
li4wang added a commit to li4wang/zookeeper that referenced this pull request Feb 12, 2024
…der() during DIFF sync (apache#2111)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
li4wang added a commit that referenced this pull request Feb 12, 2024
…der() during DIFF sync (#2111) (#2133)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
li4wang added a commit that referenced this pull request Feb 13, 2024
…der() during DIFF sync (#2111) (#2132)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
AlphaCanisMajoris pushed a commit to AlphaCanisMajoris/zookeeper that referenced this pull request Mar 28, 2024
…der() during DIFF sync (apache#2111)

Author: Li Wang <liwang@apple.com>

Co-authored-by: liwang <liwang@apple.com>
@ChengYilong89
Copy link

ChengYilong89 commented Aug 3, 2024

hello, my friend. I met a issue in the production, the temporary node in the lead is deleted, however the temporary node in the other followers is not deleted. so we got data inconsistent between lead and follower. the version of ZK is 3.8.1.
Then I saw the ISSUE-4785, and my scenario is similar with the ISSUE-4785, the follower A sync with the leader, and old leader restart, and the follower A became a new leader.
Does ISSUE-4785 can caused the data inconsistent? In your description, what does "5. The follower was bounced before writing all the uncommitted txns to disk" means? @li4wang

@li4wang
Copy link
Contributor Author

li4wang commented Aug 5, 2024

Does ISSUE-4785 can caused the data inconsistent?

Data inconsistence here means digest mismatch count is greater than 0 or you mean client see different data from leader and follower?

what does "5. The follower was bounced before writing all the uncommitted txns to disk"

This means that follower didn't persist the txns from diff sync with leader into disk before shutdown and the follower lost the txns.

@ChengYilong89
Copy link

ChengYilong89 commented Aug 5, 2024

Data inconsistence here means digest mismatch count is greater than 0 or you mean client see different data from leader and follower?
---- I mean client see different data from leader and follower. for example, leader contains data of A, B, C, but the follower can only see B, C. the A node is a temporary, and it should be deleted and sync to the follower. But in fact, it is not deleted in the follower.

This means that follower didn't persist the txns from diff sync with leader into disk before shutdown and the follower lost the txns.
----- can this result in data inconsistent between leader and follower?

@li4wang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants