Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Fix bug in handling of soft deleted documents when updating settings #723

Merged
merged 3 commits into from
Dec 6, 2022

Conversation

loiclec
Copy link
Contributor

@loiclec loiclec commented Dec 6, 2022

Pull Request

Related issue

Fixes (partially, until merged into meilisearch) meilisearch/meilisearch#3021

What does this PR do?

This PR fixes the bug where a missing key in documents database internal error message could appear when indexing documents.

When updating the settings, before clearing the database and before creating the transform output, we now modify the ExternalDocumentsIds structure to get rid of all references to soft deleted document ids in its FSTs.

It used to be that updating the settings would clear the soft-deleted document ids, but keep the original ExternalDocumentsIds structure. As a consequence of this, when processing a future document addition, we could wrongly believe that a document was being replaced when, in fact, it was a completely new document. See the tests bug_3021_first, bug_3021_second, and bug_3021 for a minimal test case that would have reproduced the issue.

We need to take special care to:

  • evaluate how users should update to v0.30.1 (containing this fix): dump? reimporting all documents from scratch?
  • understand IF/HOW this bug could have caused duplicate documents to be returned
  • and evaluate the correctness of the fix, of course :)

@loiclec loiclec added the bug Something isn't working label Dec 6, 2022
@Kerollmops Kerollmops added the no breaking The related changes are not breaking (DB nor API) label Dec 6, 2022
@loiclec loiclec force-pushed the soft-deleted-docids-settings-update-fix branch from 45b3d3c to 67d8cec Compare December 6, 2022 14:09
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! It is delightful to see this bug finally fixed!
ready to be merged for my part 🧩

Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we can remove these lifetimes.
But if we can't, you can merge!

milli/src/update/index_documents/transform.rs Show resolved Hide resolved
@Kerollmops
Copy link
Member

Let's merge this!
bors merge

@bors
Copy link
Contributor

bors bot commented Dec 6, 2022

@bors bors bot merged commit 0a301b5 into main Dec 6, 2022
@bors bors bot deleted the soft-deleted-docids-settings-update-fix branch December 6, 2022 14:56
Kerollmops pushed a commit that referenced this pull request Dec 6, 2022
723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec

# Pull Request

## Related issue
Fixes (partially, until merged into meilisearch) meilisearch/meilisearch#3021

## What does this PR do?
This PR fixes the bug where a `missing key in documents database` internal error message could appear when indexing documents.

When updating the settings, before clearing the database and before creating the transform output, we now modify the `ExternalDocumentsIds` structure to get rid of all references to soft deleted document ids in its FSTs.

It used to be that updating the settings would clear the soft-deleted document ids, but keep the original `ExternalDocumentsIds` structure. As a consequence of this, when processing a future document addition, we could wrongly believe that a document was being replaced when, in fact, it was a completely new document. See the tests `bug_3021_first`, `bug_3021_second`, and `bug_3021` for a minimal test case that would have reproduced the issue.
 
We need to take special care to:
- evaluate how users should update to v0.30.1 (containing this fix): dump? reimporting all documents from scratch?
- understand IF/HOW this bug could have caused duplicate documents to be returned 
- and evaluate the correctness of the fix, of course :)


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
bors bot added a commit that referenced this pull request Dec 6, 2022
725: Reapply fixes for v0.37.1 r=curquiza a=Kerollmops

This PR reapplies #712, #720, #722, and #723.
We needed the #720 as it was bringing more tests and snap-related improvements.

Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working no breaking The related changes are not breaking (DB nor API)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants