Optimise snapshot deletion to speed up snapshot deletion and creation #15568

ashking94 · 2024-09-02T09:48:54Z

Description

Snapshot creation is distributed in nature. The snapshot creation operation is performed by the Data node holding the primary shard. Hence the total snapshot creation work is shared amongst all the data nodes in the cluster. On the contrary, the snapshot deletion is handled solely by active cluster manager. This can lead to excessively slow snapshots deletion when there are relative higher number of primary shards in the cluster.

In this PR, we have tried fixing this problem by creating a separate thread that is responsible for performing snapshot deletion or old shard gen cleanup during snapshot creation. The thread count has been set as 4x the number of allocated processor. The thread count is bounded between 64 and 256 to ensure that we have sufficient threads to get the deletion done and not too many threads that they start eating up from the connections of other remote store operations that may happen on the same cluster.

Check List

Functionality includes testing.
~~[ ] API changes companion pull request created, if applicable.~~
~~[ ] Public documentation issue/PR created, if applicable.~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ashking94 · 2024-09-02T09:49:14Z

There are existing UTs and ITs that covers the changed code.

github-actions · 2024-09-02T10:03:31Z

❌ Gradle check result for 33b0dd1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ashish Singh <ssashish@amazon.com>

github-actions · 2024-09-02T17:28:29Z

❌ Gradle check result for 05757e2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…imisations Signed-off-by: Ashish Singh <ssashish@amazon.com>

github-actions · 2024-09-02T20:26:23Z

❌ Gradle check result for 7329e67: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…imisations Signed-off-by: Ashish Singh <ssashish@amazon.com>

Signed-off-by: Ashish Singh <ssashish@amazon.com>

github-actions · 2024-09-03T07:22:31Z

❌ Gradle check result for 345a277: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-03T07:47:23Z

❌ Gradle check result for e777412: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

server/src/main/java/org/opensearch/threadpool/ThreadPool.java

ashking94 · 2024-09-03T07:58:00Z

❌ Gradle check result for 345a277: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky tests - #15600

github-actions · 2024-09-04T04:51:51Z

❌ Gradle check result for 739557a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

opensearch-trigger-bot · 2024-09-05T04:15:08Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15568-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3fc0139ca68a1ff843ec1492c3cd52c2c4c67f02
# Push it to GitHub
git push --set-upstream origin backport/backport-15568-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15568-to-2.x.

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

…#15568) (#15725) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

…#15568) (#15724) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 added the skip-changelog label Sep 2, 2024

Snapshot deletion optimisations to speed up snapshot deletion, creation

05757e2

Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 force-pushed the snapshot-delete-optimisations branch from 33b0dd1 to 05757e2 Compare September 2, 2024 17:05

Merge remote-tracking branch 'upstream/main' into snapshot-delete-opt…

7329e67

…imisations Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 added 2 commits September 3, 2024 11:37

Merge remote-tracking branch 'upstream/main' into snapshot-delete-opt…

e777412

…imisations Signed-off-by: Ashish Singh <ssashish@amazon.com>

Fix tests

345a277

Signed-off-by: Ashish Singh <ssashish@amazon.com>

gbbafna reviewed Sep 3, 2024

View reviewed changes

server/src/main/java/org/opensearch/threadpool/ThreadPool.java Show resolved Hide resolved

ashking94 marked this pull request as ready for review September 3, 2024 07:59

ashking94 requested review from jed326, peternied, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, dbwiddis, jainankitk, kotwanikunal, linuxpi, mch2, msfroh, nknize and owaiskazi19 as code owners September 3, 2024 07:59

sachinpkale merged commit 3fc0139 into opensearch-project:main Sep 3, 2024
34 checks passed

ashking94 deleted the snapshot-delete-optimisations branch September 3, 2024 18:13

ashking94 added the backport 2.x Backport to 2.x branch label Sep 5, 2024

opensearch-trigger-bot bot added the backport-failed label Sep 5, 2024

ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Sep 5, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

cd548cd

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 mentioned this pull request Sep 5, 2024

[Backport 2.x] Optimise snapshot deletion to speed up snapshot deletion and creation(#15568) #15724

Merged

ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Sep 5, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

19887cf

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 mentioned this pull request Sep 5, 2024

[Backport 2.17] Optimise snapshot deletion to speed up snapshot deletion and creation(#15568) #15725

Merged

sachinpkale pushed a commit that referenced this pull request Sep 5, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

75fb5bc

…#15568) (#15725) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

ashking94 added a commit that referenced this pull request Sep 5, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

468f120

…#15568) (#15724) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

4b1cde0

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

BrewTestBot mentioned this pull request Sep 18, 2024

opensearch 2.17.0 Homebrew/homebrew-core#191052

Merged

opensearch-ci-bot mentioned this pull request Sep 6, 2024

[AUTOCUT] Gradle Check Flaky Test Report for IndexServiceTests #14407

Open

prudhvigodithi mentioned this pull request Oct 16, 2024

[AUTOCUT] Gradle Check Flaky Test Report for MixedClusterClientYamlTestSuiteIT prudhvigodithi/opensearch-build#82

Open

dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 16, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

53cfe8a

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 17, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

533c356

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

dk2k pushed a commit to dk2k/OpenSearch that referenced this pull request Oct 21, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation (…

190f5d3

…opensearch-project#15568) --------- Signed-off-by: Ashish Singh <ssashish@amazon.com>

opensearch-ci-bot mentioned this pull request Jun 27, 2024

[AUTOCUT] Gradle Check Flaky Test Report for S3BlobStoreRepositoryTests #14299

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise snapshot deletion to speed up snapshot deletion and creation #15568

Optimise snapshot deletion to speed up snapshot deletion and creation #15568

ashking94 commented Sep 2, 2024 •

edited

Loading

ashking94 commented Sep 2, 2024 •

edited

Loading

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

ashking94 commented Sep 3, 2024

github-actions bot commented Sep 4, 2024

opensearch-trigger-bot bot commented Sep 5, 2024

Optimise snapshot deletion to speed up snapshot deletion and creation #15568

Optimise snapshot deletion to speed up snapshot deletion and creation #15568

Conversation

ashking94 commented Sep 2, 2024 • edited Loading

Description

Check List

ashking94 commented Sep 2, 2024 • edited Loading

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Sep 3, 2024

github-actions bot commented Sep 3, 2024

ashking94 commented Sep 3, 2024

github-actions bot commented Sep 4, 2024

opensearch-trigger-bot bot commented Sep 5, 2024

ashking94 commented Sep 2, 2024 •

edited

Loading

ashking94 commented Sep 2, 2024 •

edited

Loading