-
Notifications
You must be signed in to change notification settings - Fork 81
Conversation
milli/src/heed_codec/facet/mod.rs
Outdated
pub struct FieldIdCodec; | ||
|
||
impl<'a> heed::BytesDecode<'a> for FieldIdCodec { | ||
type DItem = FieldId; | ||
|
||
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> { | ||
let (field_id_bytes, _) = try_split_array_at(bytes)?; | ||
let field_id = u16::from_be_bytes(field_id_bytes); | ||
Some(field_id) | ||
} | ||
} | ||
|
||
impl<'a> heed::BytesEncode<'a> for FieldIdCodec { | ||
type EItem = FieldId; | ||
|
||
fn bytes_encode(field_id: &Self::EItem) -> Option<Cow<[u8]>> { | ||
Some(Cow::Owned(field_id.to_be_bytes().to_vec())) | ||
} | ||
} | ||
|
||
pub struct FieldIdDocIdCodec; | ||
|
||
impl<'a> heed::BytesDecode<'a> for FieldIdDocIdCodec { | ||
type DItem = (FieldId, DocumentId); | ||
|
||
fn bytes_decode(bytes: &'a [u8]) -> Option<Self::DItem> { | ||
let (field_id_bytes, bytes) = try_split_array_at(bytes)?; | ||
let field_id = u16::from_be_bytes(field_id_bytes); | ||
|
||
let document_id_bytes = bytes[..4].try_into().ok()?; | ||
let document_id = u32::from_be_bytes(document_id_bytes); | ||
|
||
Some((field_id, document_id)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kerollmops don't we already have a codec for the FieldId
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum... I don't remember, we should check the codecs that we use in the Index
database.
milli/src/update/index_documents/extract/extract_facet_exists_docids.rs
Outdated
Show resolved
Hide resolved
milli/src/update/index_documents/extract/extract_facet_exists_docids.rs
Outdated
Show resolved
Hide resolved
milli/src/update/index_documents/extract/extract_facet_exists_docids.rs
Outdated
Show resolved
Hide resolved
milli/src/update/index_documents/extract/extract_fid_docid_facet_values.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me thank you 👍
b32bbfc
to
b327fde
Compare
OR, AND, NOT, TO must now be followed by spaces
The idea is to directly create a sorted and merged list of bitmaps in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating a grenad::Reader where the keys are field_id and the values are docids. Then we send that BTreeMap to the thing that handles TypedChunks, which inserts its content into the database.
c8fca39
to
aed8c69
Compare
milli/src/update/index_documents/extract/extract_fid_docid_facet_values.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Many the fish <many@meilisearch.com>
561: Enriched documents batch reader r=curquiza a=Kerollmops ~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~ This PR contains the work in #555 and can be merged on main as soon as reviewed and approved. - [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id. - [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`. - [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`. - [x] Remove the `update_primary_key` from the _transform.rs_ file. - [x] Really generate the auto-generated documents ids. - [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`. Co-authored-by: Kerollmops <clement@meilisearch.com>
595: Update version for next release (v0.32.0) r=ManyTheFish a=curquiza In order to release on `main` (for v0.29.0, not v0.28.1) <img width="1014" alt="Capture d’écran 2022-07-21 à 13 20 35" src="https://user-images.githubusercontent.com/20380692/180178725-381fbdf1-c0fb-4fa9-9954-452aec5a1574.png"> Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
@ManyTheFish thanks for your review :) A big PR was merged into main since then and I had to make this PR up-to-date with the new main. Rebasing was quite difficult to do so instead I merged the previous filter/field-exists and main, resolved the conflicts, and pushed the result again under the same branch. I hope it's not too messy :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch! let's merge like this.
But! In this case of big merge conflicts, I prefer to squash all my commits before rebasing. Yes, I lose my history but I find it way clearer!
Co-authored-by: Many the fish <many@meilisearch.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @loiclec, sorry for the time!
You can merge it if everything is ok on your side.
Thanks everyone! |
Build succeeded: |
596: Filter operators: NOT + IN[..] r=irevoire a=loiclec # Pull Request ## What does this PR do? Implements the changes described in meilisearch/meilisearch#2580 It is based on top of #556 Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
What does this PR do?
Fixes issue #2484 in the meilisearch repo.
It creates a
field EXISTS
filter which selects all documents containing thefield
key.For example, with the following documents:
Then the filter
colour EXISTS
selects the ids[0, 1, 2, 3, 4, 5]
. The filtercolour NOT EXISTS
selects[6]
.Details
There is a new database named
facet-id-exists-docids
. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists.To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the
facet-id-f64-docids
database, with appropriate modifications in place.There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON:
would be flattened to:
prior to being given to the extraction pipeline.
This transformation would lose the information that is needed to populate the
facet-id-exists-docids
database. Therefore, I have also changed the implementation of theflatten-serde-json
crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example:becomes