Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve API filtering 'OR' logic to work on all fields #3038

Closed
ajknv opened this issue Apr 3, 2019 · 13 comments
Closed

Improve API filtering 'OR' logic to work on all fields #3038

ajknv opened this issue Apr 3, 2019 · 13 comments
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application

Comments

@ajknv
Copy link
Contributor

ajknv commented Apr 3, 2019

Environment

  • Python version: 3.6.8
  • NetBox version: 2.5.7

Proposed Functionality

Handling of API requests involving multiple filter values should be consistent regardless of the fields involved, such that the logical OR of matching records will be returned. At the very least, multiple filters involving the same field should always work to return all matches.

Use Case

The overall utility of the API is broadly reduced by the impaired ability to make any number of fairly typical bulk queries for filtered records. The inconsistent behavior -- especially that of quietly just returning the matches for the last filter value -- pretty severely violates the principal of least surprise for API clients.

Database Changes

Unknown. Best guess is that this does not correlate to any required structural changes to the database, but likely just additional/improved glue logic between the API handlers and SQL query logic.

External Dependencies

None anticipated.

@DanSheps
Copy link
Member

DanSheps commented Apr 3, 2019

Not all filters make sense to include as multiples ("has ip address" for example, it is boolean, either on or off).

Please come up with a list of fields that you think could benefit from being able to be filtered on multiple times and we can investigate further.

@DanSheps DanSheps added status: revisions needed This issue requires additional information to be actionable type: feature Introduction of new functionality to the application labels Apr 3, 2019
@ajknv
Copy link
Contributor Author

ajknv commented Apr 3, 2019

Part of the problem with that request is that it is unclear which ones currently work and which don't short of an exhaustive search. Even for boolean fields, applying an OR filter isn't problematic from any technical perspective, though I grant that it could result in pointless queries that just return everything anyway. But consistent behavior strikes me as a much more desirable characteristic than trying to exclude pointless but harmless queries that no one is likely to attempt in practice anyway. From that perspective, I would still favor the easy answer as simply being "all of them". That said, here are a few I've hit so far that are of concern:

Devices:
name
mac_address
parent_device_id (doesn't currently work as a filter at all)
id (currently this doesn't even work as a singular filter; of course that can be accomplished with a URL component, but multiple matches are useful for bulk queries)

Interfaces:
mac_address
device_id
id

Device-bays:
name
device_id
installed_device_id (does not appear to work as a filter at all, even singular)
id

Inventory-items:
name
device_id
part_id
id

@cescone
Copy link

cescone commented Apr 3, 2019

I happened upon this issue last week, while trying to get an interface list filtered by the interface's device_id property.

Please come up with a list of fields that you think could benefit from being able to be filtered on multiple times and we can investigate further.

While people may find uses to multiple filtering almost any property, I think a good compromise would be to allow repeated filters for any property that acts like a primary key or foreign key, i.e. any id and related objects' id.

@netbox-community netbox-community deleted a comment from sdktr May 2, 2019
@jeremystretch
Copy link
Member

NetBox relies on the django-filter library for filtering object requests via both the web UI and the REST API. django-filter does not provide any built-in support for ORing multiple values specified for the same field; it only accepts the most recent (last) value provided. For example:

GET /api/dcim/sites/?name=foo&name=bar

becomes

Site.objects.get(name="bar")

While django-filter does provide a MultipleChoiceFilter, it must be instantiated with a predefined set of choices, which is not practical for freeform fields. We'll need to introduce a custom filter to instead turn the above request into

Site.objects.get(Q(name="foo") | Q(name="bar"))

so that objects matching either value are returned.

Handling of API requests involving multiple filter values should be consistent regardless of the fields involved, such that the logical OR of matching records will be returned.

I feel like you're overlooking the case of ManyToMany relationships. For example, most people would expect a request for ?tag=foo&tag=bar to return only objects with both tags (a logical AND). However, there's no clear way to differentiate that multiple values for one field should result in an AND, and multiple values for a different field should be an OR.

I propose we adapt the existing FilterSets to perform a logical OR in all instances where multiple values are provided for a field, except in the case of ManyToManyFields, where the current behavior (a logical AND) is preserved. This approach has the benefit of being the least disruptive while addressing the most common use cases.

An alternative approach would be to use comma-separated values (e.g. ?tag=foo,bar) but that introduces validation problems and additional burdens around request formation, so I'd much prefer to avoid that approach.

@jeremystretch jeremystretch added status: under review Further discussion is needed to determine this issue's scope and/or implementation and removed status: revisions needed This issue requires additional information to be actionable labels May 2, 2019
@ajknv
Copy link
Contributor Author

ajknv commented May 2, 2019

For example, most people would expect a request for ?tag=foo&tag=bar to return only objects with both tags (a logical AND).

Interestingly this feels pretty subjective to me, actually. Wanting to find a union set of objects matching any of a few different characteristics (e.g. captured in tags) seems like as much a probable use case as wanting to query for objects with multiple constraints. I'm on board with the proposed solution as a definite improvement over the current behavior while being cognizant of backwards compatibility, but I would posit that the ideal answer here is to have a meta-parameter to specify the filter operation, e.g. something like:

?filter_mode=OR&tag=foo&tag=bar

@jeremystretch
Copy link
Member

What if I want to find devices at site A OR site B with tag A AND tag B?

GET /api/dcim/devices/?site=SiteA&site=SiteB&tag=tagA&tag=tagB

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation labels May 3, 2019
@jeremystretch
Copy link
Member

I've done some digging into django-filter and I think I've identified the sticking point with filtering on multiple values for CharFilters. django-filter leverages Django forms for validation, thus any filter which needs to accept multiple values must utilize ChoiceField or one of its derivatives. ChoiceField requires that a set of choices be defined at run time for validation and form widget rendering, both of which are irrelevant in our case.

The closest solution I've found is django-filter's AllValuesFilter, which is a bit of hack: It populates a choices list from the set of all values present in the database (which requires a query to retrieve all values). This is unnecessary for our use case and obviously does not scale well.

It might be feasibly to use a dummy object for choices to essentially pretend that any given value exists in the set, but at that point we're pretty firmly into "dirty hack" territory.

@jeremystretch
Copy link
Member

I've raised django-filter #1076 to see if there's any supported way of achieving this.

@ajknv
Copy link
Contributor Author

ajknv commented May 3, 2019

What if I want to find devices at site A OR site B with tag A AND tag B?

Touché. :-) Yes, I can understand there's a balancing act in terms of where to draw the line between expressiveness and coverage of possible use cases. The example you gave would require a more fully-defined query DSL to be supported. The way I'm looking at it though is that a capability to select AND vs OR provides additional coverage of fairly important and common use cases -- probably a significant majority of them. This seems worthwhile, even if the last 10% (say) of complex use cases can't be covered and the cost to support a complete query language isn't worth it for that small tail-set.

Essentially I'm just pushing for the highest possible bang for whatever is a reasonable buck.

@jeremystretch
Copy link
Member

It looks like this is doable under djnago-filter using a simple custom filter and form field (thanks @rpkilby!). Now we just need to lock down the logic. I propose the following.

1. Different fields result in an AND

Example: GET /api/dcim/devices/?site=Foo&status=1 means site == Foo AND status == 1

2. Multiple instances of the same field are ORs

This applies to all non-ManyToManyFields.

Example: GET /api/dcim/devices/?site=Foo&site=Bar&status=1 means (site == Foo OR site == Bar) AND status == 1

3. ManyToManyFields accept CSV-formatted values to AND

Because an object may have multiple values for a ManyToManyField, we must be able to differentiate between OR and AND. We should accept comma-separated values to indicate ANDing.

Example 1: GET /api/dcim/devices/?tag=foo&tag=bar means tag == foo OR tag == bar

Example 2: GET /api/dcim/devices/?tag=foo,bar means tag == foo AND tag == bar

Note: The web UI probably won't support the AND approach, at least not immediately, but I'd still like to expose the logic for API clients.

@jeremystretch jeremystretch added status: under review Further discussion is needed to determine this issue's scope and/or implementation and removed status: accepted This issue has been accepted for implementation labels May 6, 2019
@ajknv
Copy link
Contributor Author

ajknv commented May 6, 2019

Can commas be part of the the value of a tag? If so, how should that be handled?

@jeremystretch
Copy link
Member

@ajknv the tag filter matches on Tag.slug, which is not allowed to have commas in it, so it shouldn't be an issue.

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation and removed status: under review Further discussion is needed to determine this issue's scope and/or implementation labels May 7, 2019
@candlerb
Copy link
Contributor

candlerb commented Jun 7, 2019

@jeremystretch wrote on May 6:

3. ManyToManyFields accept CSV-formatted values to AND

Because an object may have multiple values for a ManyToManyField, we must be able to differentiate between OR and AND. We should accept comma-separated values to indicate ANDing.

This sounds good to me. However, I note that commit ffa34c6 on branch develop-2.6 from May 8 says something different:

For most fields, when a filter is passed multiple times, objects matching any of the provided values will be returned. For example, GET /api/dcim/sites/?name=Foo&name=Bar will return all sites named "Foo" or "Bar". The exception to this rule is ManyToManyFields which may have multiple values assigned. Tags are the most common example of a ManyToManyField. For example, GET /api/dcim/sites/?tag=foo&tag=bar will return only sites tagged with both "foo" and "bar".

@lock lock bot locked as resolved and limited conversation to collaborators Jan 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application
Projects
None yet
Development

No branches or pull requests

5 participants