Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: incremental backup & restore: failure to take incremental backups in a multi tablet scenario #13517

Closed
shlomi-noach opened this issue Jul 17, 2023 · 1 comment · Fixed by #13555

Comments

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Jul 17, 2023

Overview of the Issue

In a cluster with multiple REPLICA/RDONLY tablets, it's possible to create a situation where vtctlclient -- Backup --incremental_from_pos=auto fails to take the backup.

This gist of the scenario is if one of the tablets is restored from backup (which wipes out its binary logs, setting gtid_purged), takes incremental backup (runs fine), and then an attempt is made to take incremental backup on the other tablet.

Reproduction Steps

Use examples/local. Assume:

  • PRIMARY tablet is zone1-0000000101
  • REPLICA is zone1-0000000100
  • RDONLY is zone1-0000000102

Run the following sequence. Note that the interleaved ApplySchema commands are there just to generate sufficient changelog in between the operations.

vtctlclient -- Backup zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctldclient RestoreFromBackup zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000102
vtctldclient ApplySchema --ddl-strategy="vitess" --sql "alter table corder force" commerce && sleep 2
vtctlclient -- Backup --incremental_from_pos=auto zone1-0000000100

The last --incremental_from_pos=auto zone1-0000000100 commands yields with something similar to:

I0717 07:47:43.728526 2090851 main.go:96] I0717 07:47:43.728145 backup.go:110] I0717 07:47:43.727878 builtinbackupengine.go:202] Executing Backup at 2023-07-17 07:47:43.727768003 +0000 UTC m=+217.129511829 for keyspace/shard commerce/0 on tablet zone1-0000000100, concurrency: 4, compress: true, incrementalFromPos: auto
I0717 07:47:43.741621 2090851 main.go:96] I0717 07:47:43.741426 backup.go:110] I0717 07:47:43.741189 builtinbackupengine.go:260] auto evaluating incremental_from_pos
I0717 07:47:43.742018 2090851 main.go:96] I0717 07:47:43.741901 backup.go:110] I0717 07:47:43.741720 builtinbackupengine.go:279] auto evaluated incremental_from_pos: MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571
E0717 07:47:43.765510 2090851 main.go:96] E0717 07:47:43.765311 backup.go:110] E0717 07:47:43.765064 backup.go:163] backup is not usable, aborting it: [Code: FAILED_PRECONDITION
Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269

cannot get binary logs to backup in incremental backup]
Backup Error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269
E0717 07:47:43.790505 2090851 main.go:105] remote error: rpc error: code = Unknown desc = TabletManager.Backup on zone1-0000000100 error: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269: cannot get binary logs to backup in incremental backup: Mismatching GTID entries. Requested backup pos has entries not found in the binary logs, and binary logs have entries not found in the requested backup pos. Neither fully contains the other. Requested pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571, binlog pos=b696e26a-2475-11ee-9d38-0a43f95f28a3:1-269

The last successful incremental backup on 102 is:

{
  "BackupMethod": "builtin",
  "Position": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:563-571",
  "PurgedPosition": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:1-562",
  "FromPosition": "MySQL56/b696e26a-2475-11ee-9d38-0a43f95f28a3:1-562",
  "Incremental": true,
  "BackupTime": "2023-07-17T07:47:43Z",
  "FinishedTime": "2023-07-17T07:47:43Z",
  "ServerUUID": "34bb1d4c-2476-11ee-85a9-0a43f95f28a3",
  "TabletAlias": "zone1-0000000102",
  "Keyspace": "commerce",
  "Shard": "0",
  "MySQLVersion": "/home/shlomi/opt/mysql/8.0.23/bin/mysqld  Ver 8.0.23 for Linux on x86_64 (Source distribution)\n",
  "UpgradeSafe": false,
  "CompressionEngine": "pargzip",
  "FileEntries": [
    {
      "Base": "BinLog",
      "Name": "vt-0000000102-bin.000001",
      "Hash": "4925e8df",
      "ParentPath": ""
    }
  ],
  "SkipCompress": false,
  "ExternalDecompressor": ""
}

The issue is we do not calculate gtid_purged correctly.

Binary Version

v17, v18

Operating System and Environment details

-

Log Fragments

No response

@shlomi-noach shlomi-noach self-assigned this Jul 17, 2023
@shlomi-noach shlomi-noach changed the title Bug Report: incremental backup restore: failure to take incremental backups in a multi tablet scenario Bug Report: incremental backup & restore: failure to take incremental backups in a multi tablet scenario Jul 25, 2023
@shlomi-noach
Copy link
Contributor Author

Addressed by #13555 with a series of endtoend tests that reproduce the error scenario (but of course now pass given the fix in the PR). Also a bunch of unit tests. In general the entire fix is one line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant