-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize stale blobs deletion during snapshot delete #3796
Conversation
@sohami @AmiStrn @dreamer-89 : Thanks for reviewing the earlier PR : #2159 . |
Gradle Check (Jenkins) Run Completed with:
|
@piyushdaftary : I re-opened the PR #2159 which already has review comments and useful discussion. Please feel free to close this one and continue on PR 2159. |
@dreamer-89 : Unable to update the old PR. Thus Created this new PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 nice! This is a great performance boost. Just have some suggestions around testing and a question around the newly introduced parameter (javadocs and user guidance); I think we can do a little better there?
int numberOfFiles = numberOfFiles(repositoryPath); | ||
|
||
logger.info("--> adding some more documents to test index"); | ||
for (int j = 0; j < 10; ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we randomize this using randomIntBetween()
to inject some entropy in the number of threads? Do you have a general feel for the number of documents per number of threads so we can use a reasonable range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize : The number of documents per thread varies a lot depending on number of documents deleted (stale) or updated between the snapshot being deleted and other remaining snapshots in the repository. I am planning to make batch size to be randomIntBetween(1, 1000)
and number of document to be in range between 1 to 10000000
. WDYT ?
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/snapshots/RepositoriesIT.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
dd4d3f1
to
53cdf18
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my side
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
4ce3769
to
c9d84ae
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #3796 +/- ##
============================================
+ Coverage 70.46% 70.58% +0.11%
- Complexity 56600 56672 +72
============================================
Files 4557 4563 +6
Lines 272737 272781 +44
Branches 40040 40043 +3
============================================
+ Hits 192188 192540 +352
+ Misses 64324 63925 -399
- Partials 16225 16316 +91
|
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java
Show resolved
Hide resolved
…delete_batch_size Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Changes have been addressed
* Parallelize stale blobs deletion during snapshot delete Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adding test which throws exception Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adusting identation for spotlessJavaCheck Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adding more description to MAX_SHARD_BLOB_DELETE_BATCH_SIZE Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Renaming max_shard_blob_delete_batch_size to max_snapshot_shard_blob_delete_batch_size Signed-off-by: Piyush Daftary <pdaftary@amazon.com> (cherry picked from commit 1c787e8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for pushing this forward. Sorry for the delay @piyushdaftary and thanks for addressing the questions. Adding my LGTM post merge for completeness!
* Parallelize stale blobs deletion during snapshot delete Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adding test which throws exception Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adusting identation for spotlessJavaCheck Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Adding more description to MAX_SHARD_BLOB_DELETE_BATCH_SIZE Signed-off-by: Piyush Daftary <pdaftary@amazon.com> * Renaming max_shard_blob_delete_batch_size to max_snapshot_shard_blob_delete_batch_size Signed-off-by: Piyush Daftary <pdaftary@amazon.com> (cherry picked from commit 1c787e8) Co-authored-by: piyush <pdaftary@amazon.com>
Signed-off-by: Piyush Daftary pdaftary@amazon.com
Description
Currently during snapshot delete, deletion of unlinked shard level blob is single threaded using SNAPSHOT threadpool. Hence if there is huge number of unlinked shard level blob flies, it will take considerable amount of time to clean them.
Hence I propose to make unlinked shard level blob deletion multi threaded delete the same way we do for cleaning up of stale indices, to speedup the overall snapshot deletion process.
Issues Resolved
#2156
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.