-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Replica gets GC'ed immediately after preemptive snapshot gets applied #17198
Comments
Is this repeatable? It's a known issue that this is possible, but it should be rare and we're supposed to recover by sending a raft-initiated snapshot. |
I've seen it twice while messing around with indigo this afternoon, so it's probably pretty repeatable. I haven't yet tried to experiment with it again though. In at least second case, the range got stuck afterwards, so there's something that needs to be fixed here. |
Interestingly, when The ranges affected by this issue were:
|
I was able to reproduce this by running a cluster built from a commit in July, running KV and a cron job that changed zone configs every 10 minutes to make sure rebalancing was constantly happening. It wasn't reproducable on master. Then I tested a commit before and after #19353 and it looks like that PR fixed this issue as well. |
On node 4 (the leaseholder):
On node 3 (where the preemptive snapshot was being sent):
This is shortly after starting up kv on indigo, where the round-trip latency is roughly 30-40ms between data centers. It's concerning both that the replica was immediately GC'ed and that once the snapshot has been successfully applied that the leaseholder apparently can't figure out that it needs to be resent if the remote replica gets GC'ed.
The text was updated successfully, but these errors were encountered: