Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements the EXISTS filter #2484

Closed
irevoire opened this issue Jun 8, 2022 · 3 comments · Fixed by #2636
Closed

Implements the EXISTS filter #2484

irevoire opened this issue Jun 8, 2022 · 3 comments · Fixed by #2636
Assignees
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations milli Related to the milli workspace v0.29.0 PRs/issues solved in v0.29.0 released on 2022-10-03
Milestone

Comments

@irevoire
Copy link
Member

irevoire commented Jun 8, 2022

After the following discussion and a meeting with @gmourier and @loiclec; https://github.com/meilisearch/product/issues/22

The first version of the filter should be implemented with the following syntax;

The keyword is postfixed
       vvvvv
 price EXISTS
^     ^
A NOT can be prefixed or infixed
vvv       vvv
NOT price NOT EXISTS

With the following set of documents;

{ "id": 1, "product": "T-shirt", "price": 20, "color": "yellow" }
{ "id": 2, "product": "T-shirt", "color": "red" }
  • The filter price EXISTS will select the first document.
  • The filter price NOT EXISTS or NOT price EXISTS will select only the second document.

If a field contains an empty array or a null value, it's considered as existing:

{ "id": 1, "product": "T-shirt", "price": 20, "color": "yellow" }
{ "id": 2, "product": "T-shirt", "price": [], "color": "red" }
{ "id": 3, "product": "T-shirt", "price": null, "color": "red" }
{ "id": 4, "product": "T-shirt", "color": "red" }

Here price EXISTS matches documents 1, 2 and 3.


This will ease the handling of incomplete documents. For example, if you want to return all T-shirt that cost less than 20€ or that doesn't have a price specified, you will be able to write product = "T-shirt" AND (price < 20 OR price NOT EXISTS).
That was not possible previously.

@irevoire irevoire added the enhancement New feature or improvement label Jun 8, 2022
@irevoire irevoire changed the title Implements the EXIST filter Implements the EXISTS filter Jun 8, 2022
@curquiza curquiza added this to the v0.29.0 milestone Jun 13, 2022
@curquiza
Copy link
Member

curquiza commented Jul 5, 2022

For people following this issue, we have already published a docker tag to test the feature

docker run -it --rm \
    -p 7700:7700 \
    getmeili/meilisearch:v0.29.0-filter.beta.0

Or you can compile the source code on the filter/field-in branch on this repo

All the information about the new addition is detailed here

Any feedback is more than welcome!! ❤️

@curquiza curquiza added milli Related to the milli workspace impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations labels Jul 5, 2022
bors bot added a commit to meilisearch/milli that referenced this issue Aug 4, 2022
556: Add EXISTS filter r=loiclec a=loiclec

## What does this PR do?

Fixes issue [#2484](meilisearch/meilisearch#2484) in the meilisearch repo.

It creates a `field EXISTS` filter which selects all documents containing the `field` key. 
For example, with the following documents:
```json
[{
	"id": 0,
	"colour": []
},
{
	"id": 1,
	"colour": ["blue", "green"]
},
{
	"id": 2,
	"colour": 145238
},
{
	"id": 3,
	"colour": null
},
{
	"id": 4,
	"colour": {
		"green": []
	}
},
{
	"id": 5,
	"colour": {}
},
{
	"id": 6
}]
```
Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`.

## Details
There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists.

To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place.

There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON:
```json
{
    "id": 0,
    "colour": [],
    "size": {}
}
```
would be flattened to:
```json
{
    "id": 0
}
```
prior to being given to the extraction pipeline.

This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example:
```json
{
    "id": 0,
    "colour": {
        "green": [],
        "blue": 1
    },
    "size": {}
} 
```
becomes
```json
{
    "id": 0,
    "colour": [],
    "colour.green": [],
    "colour.blue": 1,
    "size": []
} 
```


Co-authored-by: Kerollmops <clement@meilisearch.com>
@bors bors bot closed this as completed in c445334 Aug 23, 2022
@mech
Copy link

mech commented Sep 22, 2022

I am curious, if my field got value null, does this EXISTS work? I am trying to do a filter UI with the operator option named "Is empty" and "Is not empty" sort of like Notion filter. But it seems like this is not possible.

82kex_21_04_27

Will there be any support in the works or we need to put a value like field = '__SPECIAL_NULL__' fake value to circumvent it?

@irevoire
Copy link
Member Author

Hey @mech,
Currently, the exists filter matches the documents where the field exists.
Existing means the field in the JSON being equal to null doesn't change anything about its existence.

If a field contains an empty array or a null value, it's considered as existing.

What you're asking for would be another feature we've already thought about. It's not planned currently, but I'm going to open a discussion on our product repository, and it would be nice if you could answer me over there so we don't lose any information!
meilisearch/product#539

@meili-bot meili-bot added the v0.29.0 PRs/issues solved in v0.29.0 released on 2022-10-03 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations milli Related to the milli workspace v0.29.0 PRs/issues solved in v0.29.0 released on 2022-10-03
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants