Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Improve the estimatedNbHits when a distinctAttribute is specified #563

Merged
merged 1 commit into from
Jun 22, 2022

Conversation

Kerollmops
Copy link
Member

This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document.

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct extimatedNbHits every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

@Kerollmops Kerollmops requested a review from irevoire June 22, 2022 09:49
@Kerollmops Kerollmops added the no breaking The related changes are not breaking (DB nor API) label Jun 22, 2022
Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's better than nothing, thank you!
bors merge

@bors
Copy link
Contributor

bors bot commented Jun 22, 2022

@bors bors bot merged commit d546f6f into main Jun 22, 2022
@bors bors bot deleted the improve-estimated-nb-hits-distinct branch June 22, 2022 12:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
no breaking The related changes are not breaking (DB nor API)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants