-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Platform] Old backups are not deleted by a schedule #7780
Comments
From the environment:
@SergeyPotachev - need to know what is the best method to fabricate a UUID for the ~1200 entries where the |
@CharlotteRose This is now assigned to me. But I have not had chance to look at it yet. But best way is to set these entries to nil uuid. Something like:
So just insert uuid_nil so you can tell these apart. |
I can see that there is no constraint on schedule_uuid - so you can just insert nil and see if that helps. I have not looked at the code yet to answer your other question.
|
@CharlotteRose
This gives you a required The small hint:
If the schedule wasn't changed since March 16th, you can find uuid of the required universe in the second column and to get schedule_uuid from the first column of the same row. |
Seems like this open PR #7162 already has the fix where the schedule_uuid is removed from the query params for getExpiredBackups(). |
@hkandala |
@SergeyPotachev, I agree we need to understand more on why that line exists. |
Just realized this needs a backport, in that case I think we might need a different fix as manual backup retention option will not be available there |
@hkandala |
@SergeyPotachev, I tried to reproduce that in my local. But I am seeing schedule_uuid is being present when creating scheduled backup. Not sure if there is any specific flow where it stays null. |
So it seems that expiry was added to 2.4 in Reason for proposing this multistep approach is that #1 is lot easier and if removing db bloat is immediate concern then it achieves that. #2 is also easy to backport to 2.4 so keeping it separate. |
These are the schedules that have such backups
|
Okay figured out what's going on. BackupParams captures Multitable backup as well as single table backup. So we need to write migration to lift this scheduleUUID from first entry in backup list or we can just say that it is expected (which is consistent with many other null values and dangling references in backup schema ;)) and go on to delete it with steps 1,2 listed previously. |
Summary: Bulk of this change is subset of PR/7162 (Thanks to Mahendra!) but added few mods to that. Do not filter by scheduleUUID when generating list of expired backups. This diff also adds filtering out backups if universeUUID is no longer valid (because deleted) for a given customer. This change can be backported to solve the backup deletion issue 7780. Test Plan: Added unit test to check that backups with invalid univ do not show up Reviewers: spotachev, arnav Reviewed By: arnav Subscribers: hkandala, jenkins-bot, yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D11102
…chedule Summary: Do not filter by scheduleUUID when generating list of expired backups. This diff also adds filtering out backups if universeUUID is no longer valid (because deleted) for a given customer. Original commit: D11102 / 5dbd189 Test Plan: Added unit test to check that backups with invalid univ do not show up Reviewers: arnav Reviewed By: arnav Subscribers: jenkins-bot, yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D11113
@sb-yb - I need to ask for further clarification on step 1. The customer wants to see all outdated backups removed. Doing so manually from Platform removes from disk and DB - however I am unclear on the instruction related to "running delete" on the database. Please advise. |
Sorry. Bad instructions. So idea was to update yw database to fix the missing scheduleUUID manually and that would lead to deletion from the database on next scheduler run. FWIW I already have a fix merged yesterday so we should not have to do anything manual. LMK and I can give you exact sql to update db manually if we must. |
@sb-yb - That is great to hear! I will let them know that it will be fixed in future release. However, in the meantime I will require the steps for manual intervention as it has been expressed by the customer numerous times that they desire a fix for the immediate time frame. |
@CharlotteRose Here goes: you will be doing
Above will update all the backup rows that have expired and have null schedule to scheduleUUID burried deep in the backupInfo json. Use this page as reference: https://www.postgresql.org/docs/9.5/functions-json.html |
@sb-yb to clarify on the above - does this need to be done on a recurring schedule ? That is, does it fix that backups are not getting deleted for some jobs permanently (for this release), or does it just remove current, today expired backups? |
Later. It is one time manual step that removes current expired backups. |
@sb-yb if this is not a workaround that resolves the problem for current backup schedules, it seems very much like editing the DB is much more hazardous than simply removing old backups from disk (as is currently being done). |
So it was never mentioned that removing directly from disk is already being done. |
Steps to validate:
For deletion retry bug:
|
Summary: Bulk of this change is subset of PR/7162 (Thanks to Mahendra!) but added few mods to that. Do not filter by scheduleUUID when generating list of expired backups. This diff also adds filtering out backups if universeUUID is no longer valid (because deleted) for a given customer. This change can be backported to solve the backup deletion issue 7780. Test Plan: Added unit test to check that backups with invalid univ do not show up Reviewers: spotachev, arnav Reviewed By: arnav Subscribers: hkandala, jenkins-bot, yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D11102
Backups records in DB (table
backup
) have empty/null values forschedule_uuid
. But old backups are selected for further deletion using the next finder:So no backups could be selected if they don't have the
schedule_uuid
specified. Also this explains why the customer is able to delete old backups manually.Possible reason is that the schedule in
task_params
field has value:"scheduleUUID":null
If a task, which creates backups, takes schedule_uuid from these parameters then this could be the root cause.
Additional information is inside the ticket 661 (application.log and YW db dump).
The text was updated successfully, but these errors were encountered: