This repository has been archived by the owner on Apr 4, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 81
Soft-deletion computation no longer depends on the mapsize #747
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dureuill
added
indexing
Related to the documents/settings indexing algorithms.
no breaking
The related changes are not breaking (DB nor API)
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
labels
Dec 15, 2022
Implemented solution 2.3 from meilisearch/meilisearch#3231 (comment)
dureuill
force-pushed
the
soft-deleted_computation_doesnt_use_max_size
branch
from
December 19, 2022 09:10
09952bc
to
916c23e
Compare
Updated the PR:
These changes allow us not to modify the contents of the tests. The PR description is up-to-date |
irevoire
previously approved these changes
Dec 19, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks!
bors merge
Comment on lines
+190
to
+210
let soft_deletion = match self.strategy { | ||
DeletionStrategy::Dynamic => { | ||
// decide to keep the soft deleted in the DB for now if they meet 2 criteria: | ||
// 1. There is less than a fixed rate of 50% of soft-deleted to actual documents, *and* | ||
// 2. Soft-deleted occupy an average of less than a fixed size on disk | ||
|
||
let size_used = self.index.used_size()?; | ||
let nb_documents = self.index.number_of_documents(self.wtxn)?; | ||
let nb_soft_deleted = soft_deleted_docids.len(); | ||
|
||
(nb_soft_deleted < nb_documents) && { | ||
const SOFT_DELETED_SIZE_BYTE_THRESHOLD: u64 = 1_073_741_824; // 1GiB | ||
|
||
// nb_documents + nb_soft_deleted !=0 because if nb_documents is 0 we short-circuit earlier, and then we moved the documents to delete | ||
// from the documents_docids to the soft_deleted_docids. | ||
let estimated_document_size = size_used / (nb_documents + nb_soft_deleted); | ||
let estimated_size_used_by_soft_deleted = | ||
estimated_document_size * nb_soft_deleted; | ||
estimated_size_used_by_soft_deleted < SOFT_DELETED_SIZE_BYTE_THRESHOLD | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, it's way easier to follow this way 🪨
Canceled. |
irevoire
approved these changes
Dec 19, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I updated a comment
bors merge
Build succeeded:
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
indexing
Related to the documents/settings indexing algorithms.
no breaking
The related changes are not breaking (DB nor API)
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request
Related issue
Related to meilisearch/meilisearch#3231: After removing
--max-index-size
, themapsize
will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer.This implements solution 2.3 from meilisearch/meilisearch#3231 (comment)
What does this PR do?
User-visible
Implementation standpoint
DeletionStrategy
struct to replace the booleandisable_soft_deletion
that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default).DeletionStrategy::Dynamic
variant.AlwaysHard
orAlwaysSoft
depending on the test)Note to reviewers: this PR is optimized for a commit-by-commit review.
PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!