[docdb] PITR: Tracking issue #7120
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
roadmap-tracking-issue
This issue tracks a major roadmap item, and usually appears in the roadmap list.
Milestone
Jira Link: DB-678
Allow being able to restore the state of some subset of user data, back to a specified time.
The project tracking this work is PITR, here.
Prerequisites
✅ Design doc
MVP, only data rollback -- complete
This should allow the users to try out the feature while only rolling back data. This will explicitly not support metadata, such as CREATE / ALTER / DROP TABLE operations being rolled back. Moreover, the user will have to careful configure certain YB knobs and their own table/cluster snapshot frequency.
✅ Custom restore time for snapshots (#7015)
✅ Extend yb-admin to be able to restore at a custom time (#7121)
✅ Docs on how to use PITR, history retention, snapshot intervals, etc (#7122)
Framework and API
User should be able to setup a PITR schedule, on some subset of items and do basic CRUD operations for schedules. Moreover, we should have a way to keep history retention in sync with the frequency of snapshots, to ensure no data could get lost. Users should also be able to restore just providing a schedule and a time point.
✅ Flow history retention interval settings from the master (#7125)
✅ Enhance restore API to automatically pick correct snapshot based on user provided time (#7128)
✅ GC for PITR automatic snapshots (#7127)
✅ Mechanism to automatically take snapshots at predefined interval (#7126)
✅ API for delete of snapshot schedules (#8417)
⬜️ API for edit of snapshot schedules (#8417)
YCQL support -- v2.6
Support for YCQL is easier, as all the metadata is in our default
sys_catalog
format.Generic metadata work
Need some generic work to be able to also snapshot the master metadata and roll back some subset of that, to a point in time.
✅ Rollback of master metadata to a specified time (#7123)
⬜️ Support for YCQL roles and permissions (#8453)
CREATE TABLE / CREATE INDEX
This requires filtering out items that did not exist in the past, but exist now.
✅ Undo of CREATE TABLE (#7124)
ALTER TABLE
Currently the table schema gets stored both on the master (as table metadata + tablet schema version numbers), as well as on each tserver (as part of the tablet metadata + version number). We will need some way of reconciling the two, or of enforcing that it is kept in sync as part of the restore operation.
✅ Undo of ALTER TABLE (#7135)
TRUNCATE TABLE
Since truncate drops all current user data, we will need to take a snapshot on each tablet, before processing it, in order to be able to restore to any time between the last automatic snapshot and the truncate operation.
✅ Disallow TRUNCATE on PITR tabled tables (#11777)
⬜️ Automatically take snapshots on TRUNCATE (#7129)
⬜️ Undo of TRUNCATE TABLE (#7130)
DROP TABLE / DROP INDEX
In our current system design, snapshot data is part of rocksdb and thus has the same lifetime as tablet data. However, a table drop currently deletes all tablet data. To be able to restore this data, we would need it to not be immediately deleted, while also retaining fault tolerance properties we leverage raft for, in case nodes go down in the meantime.
✅ Introduce new tablet state for PITR deleted tables (#7131)
✅ Add raft support for data-only quiesced state (#7132)
✅ GC mechanism for PITR deleted tables (#7134)
✅ Undo of DROP TABLE (#7133)
✅ Load balancing for PITR deleted tables (#8267)
YSQL support
The YSQL metadata lives in a separate set of colocated tables, in the master tablet. These require careful handling, to ensure we only roll back metadata for one YSQL database.
Phase 1 -- v2.8.x
This is on top of the already existing generic support for DDLs, which we can leverage directly from the initial YCQL work.
✅ Per-database restore for YSQL (#8452)
✅ Support for colocated tables (#8259)
✅ Support for other types of ALTER TABLE (#1124) -- > Most of the ALTERS work
✅ Disallow TRUNCATE on PITR tabled tables (#11777)
Phase 2 -- v2.14.x (stable)
This is primarily production hardening and gating off functionality that does not work yet, to prevent user errors..
✅ DDL event history (#8773)
✅ Interaction with tablet splitting (#8257, #8235)
✅ Transactionally consistent restore (#8419)
✅ Speedup YSQL restores (#9585)
✅ Throttle CreateSnapshot requests (#10482)
✅ Throttle RestoreSnapshot requests (#11847)
✅ Disallow restores to before a run of ysql_upgrade (#11846)
✅ Disallow restores if changes to sequences (#11875)
✅ Support for Sequences (#10249)
✅ Disable PITR create schedule on a cluster with any of its databases containing tablegroups (#12484)
✅ Disable tablegroup creation if PITR is enabled on any of the databases (#12487)
✅ Prevent tablespace deletion in case PITR is enabled (#12508)
✅ Backward compatibility of pg_yb_catalog_version (#9504)
✅ Triggers, Stored Procedures and other PG features (#10350)
Future work
⬜️ Support for restoring global objects (#9912)
⬜️ Support for Tablespaces (#10257)
⬜️ Support for Tablegroups (#11924)
⬜️ Support for CDC with PITR (#12773)
⬜️ PITR in conjunction with xCluster Replication (#10820)
⬜️ Turn consistent_restore flag on by default (#12853)
⬜️ Restoration races with Index Backfill (#12672)
⬜️ Allow restore to a point in time before an upgrade (#13158)
Further testing needed
Some advanced features might work out of the box, but more QA is necessary.
⬜️ Security features of Postgres (#10349)
⬜️ More robust tests (#9502)
Support for use with external backups
⬜️ Data only restore from external backups (#8846)
⬜️ Metadata restore from external backups (#8847)
The text was updated successfully, but these errors were encountered: