This repository has been archived by the owner on Apr 4, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 81
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MarinPostma
force-pushed
the
optimize-document-transform
branch
from
October 25, 2021 08:29
e42e845
to
3fcccc3
Compare
Kerollmops
suggested changes
Oct 25, 2021
Kerollmops
suggested changes
Oct 25, 2021
we don't know the document id at this point, and returning the whole document is not very helpful to locate it in a large csv document, unlike the line number... |
MarinPostma
force-pushed
the
optimize-document-transform
branch
from
October 25, 2021 14:28
a7ce5ce
to
135fa16
Compare
MarinPostma
force-pushed
the
optimize-document-transform
branch
from
October 25, 2021 15:41
135fa16
to
b16c2ad
Compare
Kerollmops
previously approved these changes
Oct 25, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. we just need to make it pass the CI now 🎪
I always forget that you need to run on all targets here... 🤦♂️ |
MarinPostma
force-pushed
the
optimize-document-transform
branch
from
October 25, 2021 16:15
b16c2ad
to
055ba13
Compare
MarinPostma
force-pushed
the
optimize-document-transform
branch
from
October 25, 2021 16:31
055ba13
to
baddd80
Compare
Kerollmops
approved these changes
Oct 25, 2021
bors merge |
bors bot
added a commit
to meilisearch/meilisearch
that referenced
this pull request
Oct 26, 2021
1847: Optimize document transform r=MarinPostma a=MarinPostma integrate the optimization from meilisearch/milli#402. optimize payload read, by reading it to RAM first instead of streaming it. This means that the payload must fit into RAM, which should not be a problem. Add BufWriter to the obkv writer to improve write speed. I have measured a gain of 40-45% in speed after these optimizations. Co-authored-by: marin postma <postma.marin@protonmail.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
Visitor
, that transform the json straight into obkv without intermediate representation.