Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to delete documents by filter #756

Merged
merged 7 commits into from
May 23, 2023

Conversation

sanders41
Copy link
Collaborator

Pull Request

Related issue

Fixes #<issue_number>

What does this PR do?

  • Adds a delete_documents_by_filter method

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@sanders41 sanders41 added the enhancement New feature or request label May 17, 2023
@@ -754,6 +754,33 @@ def delete_documents(self, ids: List[Union[str, int]]) -> TaskInfo:
)
return TaskInfo(**response)

def delete_documents_by_filter(
self, filter: Union[str, List[Union[str, List[str]]]] # pylint: disable=redefined-builtin
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylint doesn't like the parameter being named filter so I set it to ignore this error. If you would rather call it something else I can do that. If you want to rename do you have any suggestions for a name? I couldn't think of a good one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it's ok to keep filter. Is this the same issue with filters in the plural?
I don't think we should give it any other name than the one it has

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint would be ok with filters. The reason pylint doesn't like filter is because python has a built in filter function so it is trying to prevent overlap. In this case I can't think of a time where naming it filter would actually cause any issues though so I think either ignoring the pylint error or renaming to filters are both fine options, which ever you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, there is no problem to keep the same name as in the Meilisearch API. I will merge it

@bidoubiwa
Copy link
Contributor

Very nice ✨

If possible, can it be updated following this comment.

@sanders41
Copy link
Collaborator Author

@bidoubiwa continuing our discussion about the version helper here. If I run delete_documents_by_filter on Meilisearch 1.1 I get a MeilisearchApiError: meilisearch.errors.MeilisearchApiError: MeilisearchApiError. 405 Client Error: Method Not Allowed for url: http://127.0.0.1:7700/indexes/indexUID/documents/delete.

@sanders41
Copy link
Collaborator Author

I also want to get your thoughts on combining delete_documents and delete_documents_by_filter. I can do that, but because delete_documents takes an ids parameter I can't think of a way to do it without a breaking change. I would need to change delete_documents(self, ids: List[Union[str, int]]) to something like delete_documents(self, params: DocumentsDeletionQuery | DocumentsIds).

@sanders41
Copy link
Collaborator Author

sanders41 commented May 17, 2023

And FYI, the tests are failing because 1.2.0-rc.1 was just released, but it hasn't finished publishing yet. I'll re-run once it's available.

@brunoocasali
Copy link
Member

@sanders41 I've published new specs to be applied in this new version. Can you check it? (it is on the central issue)

def delete_documents(self, ids: List[Union[str, int]]) -> TaskInfo:
def delete_documents(
self,
ids: List[Union[str, int]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the new approach from the central issue, the idea should be to move the delete_documents_by_filter to the same delete_documents by adding a new argument filter to the method. Also, I don't know how Python handles deprecations, but the ids argument should be deprecated in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brunoocasali I considered this new approach, but think it could be confusing. If a user sends both ids and a filter the filter will be silently ignored. Is this OK? Based on the message this means eventually you will only be able to delete by filter and not id?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct!

I understand that could be strange at first sight, but the idea is to add the functionality and not break the current usage (without creating a new method just for a new use case).

Also, I understand that our SDKs (most of them) are not stable, but that is not the foresee I have for them. I believe breakings are always a bad sign, especially when they are not "expected" please remember the context explained here.

Since those SDKs are unstable, we can eliminate the mixed behavior when we move them to a v1 (not necessarily being directly related to the engine being on v2). Still, the cool thing is that we introduced warnings/deprecations to the users before doing it, so everyone is "ready".

@sanders41
Copy link
Collaborator Author

@brunoocasali I updated as requested. I made filter a named only parameter so you can use index.delete_documents(filter="genre=action") rather than index.delete_documents(None, "genre=action"). If you don't like that let me know and I'll change it.

@sanders41
Copy link
Collaborator Author

sanders41 commented May 20, 2023

I was in a hurry when writing my reason for using a named parameter for filter and now realize my reason isn't really clear since index.delete_documents(filter="genre=action") will also work without forcing the named parameter.

The first thing I wanted to prevent is with ids being depreciated it would be natural to assume filter is the first parameter so I didn't want people sending a filter in that spot and getting errors. Since we want to prevent breaking current code I couldn't change the order of the parameters.

The second, and more important, is once ids is remove index.delete_documents(None, "genre=action") will break, but index.delete_documents(filter="genre=action") will continue to work so it will break less code. With that I wanted to force this style.

@bidoubiwa
Copy link
Contributor

Hey @sanders41

We discussed a bit to think about all possible options as we did not agree. Below you have the two design solutions we thought of. Which one do you think is best?

Current behavior

These behavior should NOT be breaking in any of the suggested designs

delete_documents([1,2])
delete_documents(["1","2"])
delete_documents("1")
delete_documents(1)
delete_documents(ids=[1,2])
delete_documents(ids=["1","2"])
delete_documents(ids="1")
delete_documents(ids=1)

Option 1: Named variables

delete_documents(ids=[1,2], filter=["id = 1"]) # filter is ignored
delete_documents([1,2], filter=["id = 1"]) # filter is ignored

delete_documents(None, filter=["id = 1"]) # filter
delete_documents(filter=["id = 1"]) # filter

delete_documents(["id = 1"]) # fails as it is considered a document id

Pros:

  • ids expect ids, filter expects filter. Intuitive

Cons:

  • when both are provided filter is silently ignored
  • On first param, without naming, it is always considered document ids.

Option 2: analyse first parameter type

In this option we introduce filter as being par of a dictionary.

{
    "filter": "id = 1"
}

Based on the type of the first parameter

Use old route if first param is:

  • array of number/string
  • String

Use new route if first param is:

  • Dictionnary
delete_documents([1,2]) # document ids
delete_documents(ids=[1,2]) # document ids
delete_documents({ filter: ["id = 1"]}) # filter
delete_documents(ids={ filter: ["id = 1"]}) # filter

Pros:
* Using a dictionnary lets us add possible new parameter easily in the futur, it follows the implementation of the search route params,
* First parameter can be either documents ids or parameters
Cons:
* ids={ filter: [""]} is valid.

Option 3: Both

It is possible to combine both option but implementation gets messy.

@sanders41
Copy link
Collaborator Author

I slightly lean towards option 1 even though I think it is ultimately a confusing option for the user. With this option I don't like the quasi optional parameters. The are listed as optional but really one or the other is required.

With that said I think options 2's ids={ filter: [""]} is even more confusing since ids and filters are two different things so passing a filter as ids seems even more confusing.

@alallema
Copy link
Contributor

Hi @sanders41,
Thank you for your answer.

I slightly lean towards option 1 even though I think it is ultimately a confusing option for the user.

I kind of agree with you on this but I'm not really sure why. Did you find this solution more idiomatic to Python?
I also wonder if many users use this method with the named parameter because the goal would be to rename the field ids param or parameter so this method won't work and will be replaced by the second one:

delete_documents(ids=1)
delete_documents(param=1)

But it will be a breaking change at some point.

@sanders41
Copy link
Collaborator Author

I have a tendency to used named parameters (even when it isn't required) simply because it makes for less breaking changes. The parameters can be moved around, added to, etc without breaking my code. With this being my tendency, I would probably use delete_documents(filter=["id = 1"]) and this had an influence on my decision because I was thinking when ids gets removed my code would still work. If it gets renamed to param that wouldn't be the case though.

Looking through code bases I probably see an equal amount of using named parameters and positional parameters. One isn't really more idiomatic over the other so unfortunately we can't really use that as the decision point.

The one thing I can say here is Python has the option for specifying both position only arguments and named only arguments. def delete_documents(ids, /) is position only and def delete_documents(*, ids) is named only. I see the named only used all the time. I have never seen position only used. This makes me think other people have the same tendency I do to use the name, but I don't have any data to back that up.

@bidoubiwa
Copy link
Contributor

bidoubiwa commented May 23, 2023

Hey @sanders41

Since everyone seems to agree, let's go for solution 1 if that's okey for you

@alallema
Copy link
Contributor

Since everyone seems to agree, let's go for solution 1 if that's okey for you

I was about to say the same thing. Agreed

Copy link
Contributor

@alallema alallema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sanders41 for the implementation of this release feature and the feedback on it 🚀
LGTM!

@alallema alallema merged commit 3041b19 into bump-meilisearch-v1.2.0 May 23, 2023
@alallema alallema deleted the delete-documents-by-filter branch May 23, 2023 11:55
meili-bors bot added a commit that referenced this pull request Jun 5, 2023
755: Changes related to the next Meilisearch release (v1.2.0) r=alallema a=meili-bot

Related to this issue: meilisearch/integration-guides#261

This PR:
- gathers the changes related to the next Meilisearch release (v1.2.0) so that this package is ready when the official release is out.
- should pass the tests against the [latest pre-release of Meilisearch](https://github.com/meilisearch/meilisearch/releases).
- might eventually contain test failures until the Meilisearch v1.2.0 is out.

⚠️ This PR should NOT be merged until the next release of Meilisearch (v1.2.0) is out.

_This PR is auto-generated for the [pre-release week](https://github.com/meilisearch/integration-guides/blob/main/resources/pre-release-week.md) purpose._

Done:
- #756 
- #757 


Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: Paul Sanders <psanders1@gmail.com>
Co-authored-by: Amélie <alallema@users.noreply.github.com>
Co-authored-by: alallema <amelie@meilisearch.com>
meili-bors bot added a commit that referenced this pull request Jun 5, 2023
772: Update version for the next release (v0.28.0) r=alallema a=meili-bot

Release CHANGELOG:

This version introduces features released on Meilisearch v1.2.0 🎉
Check out the changelog of [Meilisearch v1.2.0](https://github.com/meilisearch/meilisearch/releases/tag/v1.2.0) for more information on the changes. 
⚠️ If you want to adopt new features of this release, **update the Meilisearch server** to the according version.

## 🚀 Enhancements

* The method `delete_documents()` now supports a new parameter `filter` that allows deleting documents by filtering. The `filter` field works precisely like the `filter` field used on the `search` method. See [the docs on how to use filters](https://www.meilisearch.com/docs/learn/advanced/filtering#filter-basics). (#756) `@sanders41` 
* Add the ability to receive a `filter` key from the `get_documents`/`documents` methods. The `filter` field works precisely like the `filter` field used on the `search` method. See [the docs on how to use filters](https://www.meilisearch.com/docs/learn/advanced/filtering#filter-basics). (#757) `@sanders41` 

Thanks again to `@sanders41!` 🎉

Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: Amélie <alallema@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants