-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number of objects allocated by SLM when listing the snapshots to retain #99953
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
I was wondering if we should instead introduce a new master-node transport action, to be called by SLM, which calls |
Another thing worth considering is whether we could refine |
That makes perfect sense, thanks David. |
A small refactoring to make elastic#99953 a little simpler: combine the logic for retrieving the snapshot info and filtering out the ineligible ones into a single function so we can replace it with a call to a dedicated client action in a followup.
A small refactoring to make #99953 a little simpler: combine the logic for retrieving the snapshot info and filtering out the ineligible ones into a single function so we can replace it with a call to a dedicated client action in a followup.
We only ever test each instance this predicate once, immediately after creating it, so we may as well just convert it into a regular method that returns `boolean` instead. More preliminary work before fixing elastic#99953
There is no need to obtain `SnapshotInfo` for all snapshots in order to compute SLM retention. With this commit we move to computing it directly from the `RepositoryData` in most circumstances, and in rare situations where we must still retrieve `SnapshotInfo` blobs we make sure not to hold many in memory at once. Closes elastic#99953
A small refactoring to make elastic#99953 a little simpler: combine the logic for retrieving the snapshot info and filtering out the ineligible ones into a single function so we can replace it with a call to a dedicated client action in a followup.
…lastic#100053) We only ever test each instance of this predicate once, immediately after creating it, so we may as well just convert it into a regular method that returns `boolean` instead. More preliminary work before fixing elastic#99953
There is no need to obtain `SnapshotInfo` for all snapshots in order to compute SLM retention. With this commit we move to computing it directly from the `RepositoryData` in most circumstances, and in rare situations where we must still retrieve `SnapshotInfo` blobs we make sure not to hold many in memory at once. Closes #99953
A small refactoring to make elastic#99953 a little simpler: combine the logic for retrieving the snapshot info and filtering out the ineligible ones into a single function so we can replace it with a call to a dedicated client action in a followup.
…lastic#100053) We only ever test each instance of this predicate once, immediately after creating it, so we may as well just convert it into a regular method that returns `boolean` instead. More preliminary work before fixing elastic#99953
The SLM retention clean up task
SnapshotRetentionTask
lists all snapshots in order to identify the snapshots to retain and the snapshots to delete. While doing so it retrieves the full snapshot information, including shard snapshot details and shard snapshot failures, to later only use snapshot metadata and snapshot timestamp to select the snapshots to retain. When the snapshots contain thousands of shards it represents of lot of objects that are unnecessary created, putting a lot of pressure on the garbage collector.We should improve the way SLM retrieves snapshots to reduce the huge allocations of objects. David made some interesting suggestions in comments.
The text was updated successfully, but these errors were encountered: