You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a cluster with multiple REPLICA/RDONLY tablets, it's possible to create a situation where vtctlclient -- Backup --incremental_from_pos=auto fails to take the backup.
This gist of the scenario is if one of the tablets is restored from backup (which wipes out its binary logs, setting gtid_purged), takes incremental backup (runs fine), and then an attempt is made to take incremental backup on the other tablet.
Reproduction Steps
Use examples/local. Assume:
PRIMARY tablet is zone1-0000000101
REPLICA is zone1-0000000100
RDONLY is zone1-0000000102
Run the following sequence. Note that the interleaved ApplySchema commands are there just to generate sufficient changelog in between the operations.
The last --incremental_from_pos=auto zone1-0000000100 commands yields with something similar to:
I0717 07:47:43.728526 2090851 main.go:96] I0717 07:47:43.728145 backup.go:110] I0717 07:47:43.727878 builtinbackupengine.go:202] Executing Backup at 2023-07-17 07:47:43.727768003 +0000 UTC m=+217.129511829 for keyspace/shard commerce/0 on tablet zone1-0000000100, concurrency: 4, compress: true, incrementalFromPos: auto
I0717 07:47:43.741621 2090851 main.go:96] I0717 07:47:43.741426 backup.go:110] I0717 07:47:43.741189 builtinbackupengine.go:260] auto evaluating incremental_from_pos
I0717 07:47:43.742018 2090851 main.go:96] I0717 07:47:43.741901 backup.go:110] I0717 07:47:43.741720 builtinbackupengine.go:279] auto evaluated incremental_from_pos: MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571
E0717 07:47:43.765510 2090851 main.go:96] E0717 07:47:43.765311 backup.go:110] E0717 07:47:43.765064 backup.go:163] backup is not usable, aborting it: [Code: FAILED_PRECONDITION
Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269
cannot get binary logs to backup in incremental backup]
Backup Error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269
E0717 07:47:43.790505 2090851 main.go:105] remote error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269
shlomi-noach
changed the title
Bug Report: incremental backup restore: failure to take incremental backups in a multi tablet scenario
Bug Report: incremental backup & restore: failure to take incremental backups in a multi tablet scenario
Jul 25, 2023
Addressed by #13555 with a series of endtoend tests that reproduce the error scenario (but of course now pass given the fix in the PR). Also a bunch of unit tests. In general the entire fix is one line.
Overview of the Issue
In a cluster with multiple
REPLICA
/RDONLY
tablets, it's possible to create a situation wherevtctlclient -- Backup --incremental_from_pos=auto
fails to take the backup.This gist of the scenario is if one of the tablets is restored from backup (which wipes out its binary logs, setting
gtid_purged
), takes incremental backup (runs fine), and then an attempt is made to take incremental backup on the other tablet.Reproduction Steps
Use
examples/local
. Assume:PRIMARY
tablet iszone1-0000000101
REPLICA
iszone1-0000000100
RDONLY
iszone1-0000000102
Run the following sequence. Note that the interleaved
ApplySchema
commands are there just to generate sufficient changelog in between the operations.The last
--incremental_from_pos=auto zone1-0000000100
commands yields with something similar to:The last successful incremental backup on
102
is:The issue is we do not calculate
gtid_purged
correctly.Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: