Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot Deletion Could Run more Concurrently to Snapshot Creation #82853

Closed
Tracked by #77466
original-brownbear opened this issue Jan 20, 2022 · 3 comments
Closed
Tracked by #77466
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

original-brownbear commented Jan 20, 2022

Currently, snapshot deletion completely locks the repository it operates on, forbidding both shard level snapshot creates and snapshot finalisations for the entire duration of the delete.
This is a problem when when executing very large deletes (as in large product of snapshot count * indices_per_snapshot) because blob stores like S3 are often somewhat rate limited in terms of how many blobs can be deleted per unit of time.
Without a change to the file structure in a repository I don't think we can speed up deletes much end-to-end beyond the current state. What we can however do is reduce the impact of a long running delete considerably by the following change.

Currently a snapshot delete runs as follows:

  • create delete entry in the cluster state
  • wait for all other repository operations to halt
  • update all metadata in the repository to reflect the removal of the deleted snapshots
  • remove all files now unreferenced (this is the slow part in most real-world cases)
  • remove delete from cluster state to allow other operations to continue

With the way deletes are currently implemented this level of locking out of other operations is unnecessary.

Switching the order of the last two steps to:

  • remove delete from cluster state to allow other operations to continue
  • remove all files now unreferenced (now that we don't block other operations any more slowness isn't so bad here)

Would drastically reduce the impact of a slow delete without introducing any safety issues.

The reason for this being a safe change is as follows:

When we delete a snapshot, we do the following steps:

  1. Write new metadata to the the shard folders int he repository (this introduces no potential for corruption because shard-level metadata only matters when it's referenced from the repository root)
  2. Collect all blobs that can be deleted after that metadata would be referenced from the root-level as well as all index-folders that can be deleted after its referenced on heap. (these paths will never be referenced again due to UUID-naming so we can safely clean them up without worrying about races once the next step completes)
  3. Write new root-level metadata
  4. Run deletes for everything tracked in step 2.

This also does not introduce an added risk of leaking blobs in all practically relevant cases because it fundamentally does not change the cleanup properties of deletes in step 2 above. We will always collect all unreferenced index folders and all unreferenced blobs in shard folders that get touched by the delete in step 2. Hence, if a delete fails to fully complete the next delete will simply pick up where it left off. The mechanics of what we try to delete and when are not touched by this change. All we are changing is allowing other operations to continue while we run the delete of known-to-not-be-referenced-ever blobs concurrently to other operations.

@original-brownbear original-brownbear added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jan 20, 2022
@original-brownbear original-brownbear self-assigned this Jan 20, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 20, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

Didn't we do this in #86514? If not, what's left of this idea?

@original-brownbear
Copy link
Member Author

Right we/I could've done better by avoiding duplicate blob deletes but that's a lot of work and the current implementation mostly performs just fine now => Closing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants