Improve the `estimatedNbHits` when a `distinctAttribute` is specified #563

Kerollmops · 2022-06-22T09:49:09Z

This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document.

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct extimatedNbHits every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

irevoire

I guess that's better than nothing, thank you!
bors merge

bors · 2022-06-22T12:53:00Z

Build succeeded:

Improve the estimatedNbHits when distinct is enabled

d2f84a9

Kerollmops requested a review from irevoire June 22, 2022 09:49

Kerollmops mentioned this pull request Jun 22, 2022

Bug with distinct attributes and nbHits meilisearch/meilisearch#2532

Closed

Kerollmops added the no breaking The related changes are not breaking (DB nor API) label Jun 22, 2022

irevoire approved these changes Jun 22, 2022

View reviewed changes

bors bot merged commit d546f6f into main Jun 22, 2022

bors bot deleted the improve-estimated-nb-hits-distinct branch June 22, 2022 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the `estimatedNbHits` when a `distinctAttribute` is specified #563

Improve the `estimatedNbHits` when a `distinctAttribute` is specified #563

Kerollmops commented Jun 22, 2022

irevoire left a comment

bors bot commented Jun 22, 2022

Improve the estimatedNbHits when a distinctAttribute is specified #563

Improve the estimatedNbHits when a distinctAttribute is specified #563

Conversation

Kerollmops commented Jun 22, 2022

irevoire left a comment

Choose a reason for hiding this comment

bors bot commented Jun 22, 2022

Improve the `estimatedNbHits` when a `distinctAttribute` is specified #563

Improve the `estimatedNbHits` when a `distinctAttribute` is specified #563