-
Notifications
You must be signed in to change notification settings - Fork 81
Conversation
b02c7e8
to
cb490c3
Compare
I just ran a benchmark to check that we improve the time we take to index documents compared to what we lost in #555. https://github.com/meilisearch/milli/actions/runs/2540950078 |
fc1c80d
to
b6ab40b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fairly big PR for me, so I did my best, but overall I think it looks good :) There are mostly two potential issues:
- should we verify that the floats inside the geo search fields are valid longitude/latitudes, for example by checking that they are not NaN, inf, etc.?
- I think there might be a bug in the code that finds the nested primary key
Apart from that, I had two minor comments:
- we'll need to reintroduce the functionality provided by
serde_impl.rs
, as I don't think it will be acceptable, performance-wise, to deserialise the whole array of added documents at once - the “guessed” primary ID uses the whole
DocumentsBatchIndex
, but the Meilisearch documentation says that the guess should only use the first document. This is not a change in behaviour introduced by this PR though, so it's not blocking as far as I am concerned
e9c406e
to
805c513
Compare
Thank you for the changes! :) Regarding the nested primary keys again. Could you check that the function works correctly when we have the document: { "a" : { "b" : { "c" : 1 }}} and the primary keys |
Thank you very much for this bug-finding again @loiclec, I don't know how you find them but that's very impressive 👀 💪 |
903a3e0
to
448114c
Compare
I just rebased this PR on top of the main branch but I would like @loiclec's review as I think you did change the transform file, but please just review this file if it's too much 😃 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I can't spot anything that's wrong, although I mostly looked at the transform file and (relatively) quickly read through the changes since the last review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it's a super cool PR!
I think we can simplify the transform even more with your enriched batch reader 😁
@@ -205,8 +191,7 @@ impl<'a, 'i> Transform<'a, 'i> { | |||
// it, transform it into a string and validate it, and then update it in the | |||
// document. If none is found, and we were told to generate missing document ids, then | |||
// we create the missing field, and update the new document. | |||
let mut uuid_buffer = [0; uuid::fmt::Hyphenated::LENGTH]; | |||
let external_id = if primary_key_id_nested { | |||
if primary_key_id_nested { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get rid of this part too and delete the flatten_from_field_mapping
function!
We'll flatten the document later with the function that works with the field id map 😁
Also, in case we need to keep this if, the comment above is outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum... Thank you indeed, it maybe emerged from the rebase I did on main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this section and the code that supported it :)
Implemented in meilisearch/milli#561
Thank you for your review 🙏, would it be possible for you to do that next week, as I am on vacation, please? |
Otherwise it is not possible to iterate over all documents while using the fields index at the same time.
3201801
to
fc9f3f3
Compare
Co-authored-by: Many the fish <many@meilisearch.com>
@ManyTheFish I committed your suggested change to add a comment. Was there anything else to do? |
changed applied or issue will be open
Checked with @loiclec, we can finally merge!!! bors merge |
Build succeeded: |
Implemented in meilisearch/milli#561
Implemented in meilisearch/milli#561
* Bump openapi spec version to v0.29 * Update 0001-script-based-tokenizer.md (#159) Change tokenizer specs to better fit Charabia implementation * Update the geosearch error (#161) Implemented in meilisearch/milli#561 * Auto-batching - Enable feature by default and remove unwanted options (#162) * Update specs according to new auto-batching behavior * update batchUid to make it internal and hidden from a task resource representation * Remove the batchUid mentions from the task API * Update open-api.yaml Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com> * update future possibilities Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com> * Search API — Filters - Introduce IN and EXISTS and describe filter capabilities in more precisely (#163) * Write a specification for the new (and old) search filters EXISTS IN NOT (new behaviour) != (new behaviour) * Apply suggestions from code review Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com> * Add missing settings object in the task details field of a settingsUpdate task type (#164) * Remove `name` from indexes resource definition (#165) * Misc — Soft deleted documents (Performance optimization) (#168) * create a spec for the soft deleted documents * Rename spec file, minor adjustements * Replace You and We by A user and Meilisearch Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com> * Add Stats Seen event (#169) * Add examples component for each summarized task type (#170) * Version API — Catch up (#171) * Add version-api.md * Add PR number as a spec file prefix * Add health-api.md (#172) * Search API — Add `matchingStrategy` parameter with `last` / `all` strategies (#173) * Introduce a proposal to boot the specification * Update telemetry * Replace wordMatchingStrategy by matchingStrategy * fix missing backtick md Co-authored-by: Many the fish <legendre.maxime.isn@gmail.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com> Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
This PR is based on #555 and must be rebased on main after it has been merged to ease the review.This PR contains the work in #555 and can be merged on main as soon as reviewed and approved.
EnrichedDocumentsBatchReader
that contains the external documents id.EnrichedDocumentsBatchReader
.EnrichedDocumentsBatchReader
in theTransform::read_documents
.update_primary_key
from the transform.rs file.Transform::read_documents
.