Optimize document transform #402

MarinPostma · 2021-10-24T13:46:30Z

This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:

For json, we build a serde Visitor, that transform the json straight into obkv without intermediate representation.
For csv, we directly write the lines in the obkv, applying other optimization as well.

milli/src/documents/mod.rs

milli/src/documents/builder.rs

milli/src/documents/mod.rs

milli/src/documents/builder.rs

MarinPostma · 2021-10-25T14:24:37Z

we don't know the document id at this point, and returning the whole document is not very helpful to locate it in a large csv document, unlike the line number...

Kerollmops

Looks good to me. we just need to make it pass the CI now 🎪

MarinPostma · 2021-10-25T15:50:51Z

I always forget that you need to run on all targets here... 🤦‍♂️

MarinPostma · 2021-10-26T09:41:20Z

bors merge

bors · 2021-10-26T10:04:55Z

Build succeeded:

1847: Optimize document transform r=MarinPostma a=MarinPostma integrate the optimization from meilisearch/milli#402. optimize payload read, by reading it to RAM first instead of streaming it. This means that the payload must fit into RAM, which should not be a problem. Add BufWriter to the obkv writer to improve write speed. I have measured a gain of 40-45% in speed after these optimizations. Co-authored-by: marin postma <postma.marin@protonmail.com>

MarinPostma mentioned this pull request Oct 25, 2021

Optimize document transform meilisearch/meilisearch#1847

Merged

MarinPostma added 6 commits October 25, 2021 10:26

optimize document deserialization

8d70b01

implement csv serialization

0f86d6b

fix tests

2e62925

document errors

53c79e8

add csv builder tests

430e9b1

add document builder example

3fcccc3

MarinPostma marked this pull request as ready for review October 25, 2021 08:28

MarinPostma force-pushed the optimize-document-transform branch from e42e845 to 3fcccc3 Compare October 25, 2021 08:29

MarinPostma requested a review from Kerollmops October 25, 2021 08:29

curquiza mentioned this pull request Oct 25, 2021

Update version for the next release (v0.19.0) #401

Merged

curquiza changed the title ~~optimize document transform~~ Optimize document transform Oct 25, 2021

curquiza added the DB breaking The related changes break the DB label Oct 25, 2021

Kerollmops suggested changes Oct 25, 2021

View reviewed changes

milli/src/documents/mod.rs Outdated Show resolved Hide resolved

milli/src/documents/builder.rs Outdated Show resolved Hide resolved

MarinPostma force-pushed the optimize-document-transform branch from a7ce5ce to 135fa16 Compare October 25, 2021 14:28

return float parsing error context in csv

f9445c1

MarinPostma force-pushed the optimize-document-transform branch from 135fa16 to b16c2ad Compare October 25, 2021 15:41

MarinPostma requested a review from Kerollmops October 25, 2021 15:42

Kerollmops previously approved these changes Oct 25, 2021

View reviewed changes

MarinPostma dismissed Kerollmops’s stale review via 055ba13 October 25, 2021 16:15

MarinPostma force-pushed the optimize-document-transform branch from b16c2ad to 055ba13 Compare October 25, 2021 16:15

implement review suggestions

baddd80

MarinPostma force-pushed the optimize-document-transform branch from 055ba13 to baddd80 Compare October 25, 2021 16:31

Kerollmops approved these changes Oct 25, 2021

View reviewed changes

bors bot merged commit d7943fe into main Oct 26, 2021

bors bot deleted the optimize-document-transform branch October 26, 2021 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize document transform #402

Optimize document transform #402

MarinPostma commented Oct 24, 2021 •

edited

Loading

MarinPostma commented Oct 25, 2021

Kerollmops left a comment

MarinPostma commented Oct 25, 2021

MarinPostma commented Oct 26, 2021

bors bot commented Oct 26, 2021

Optimize document transform #402

Optimize document transform #402

Conversation

MarinPostma commented Oct 24, 2021 • edited Loading

MarinPostma commented Oct 25, 2021

Kerollmops left a comment

Choose a reason for hiding this comment

MarinPostma commented Oct 25, 2021

MarinPostma commented Oct 26, 2021

bors bot commented Oct 26, 2021

MarinPostma commented Oct 24, 2021 •

edited

Loading