-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: xtrabackup on replica + promotion to primary causes outage #12754
Comments
Thanks for assigning this quickly - I think the root cause might be this line: the assumption is that the
|
I'm not entirely sure on what the "correct" logic here should be - perhaps we store the previous tablet types for the case of non-serving backup engines (which I think changes the tablet type to BACKUP for the duration) so we need to keep the logic for that, but for online backup types where the tablet remains serving, we need to not do anything. Other thoughts:
|
Final note: I noticed it seems like this PR (this issue) is also about to change that same bit of the code, though I think the logic from the POV of this bug remains the same. |
After more investigation, it's not the line we thought, as xtrabackup does not drain the tablet and ergo does not set the tablet back to its original type. It's this line: vitess/go/vt/vtctl/grpcvtctldserver/server.go Line 505 in c0463e3
|
I patched this on our end by adding
I can upstream a fix |
Overview of the Issue
If an xtrabackup is started on a replica and that replica is promoted to primary, the backup continues to run and eventually kills the primary:
Key bit seems to be the two lines I marked where it turns itself into a replica, and then is confused that the primary has type replica 🥲
Manual emergency reparent got us out of this situation: no orchestrator intervention since the mysql topology was technically fine. Not sure how this'd behave with vtorc (whether that is also watching vitess topo), we're still a short bit away from testing that yet.
Reproduction Steps
vtctl plannedreparentshard
while the backup is ongoing.Binary Version
Version: 14.0.4-SNAPSHOT (Git revision 02da1bfa3d10cd0c279c0aaf641324609d8dfe0c branch 'HEAD') built on Thu Mar 9 19:20:30 UTC 2023 by root@buildkitsandbox using go1.18.7 linux/amd64 (this is our own build of v14 with some custom patches that are probably not relevant)
Operating System and Environment details
Log Fragments
The text was updated successfully, but these errors were encountered: