Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort dynamic choice list search results by relevance #35494

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Charl1996
Copy link
Contributor

@Charl1996 Charl1996 commented Dec 10, 2024

Please note, this PR needs tests. I've only tested the code in this PR manually.

Product Description

This PR is a fix for the dynamic_choice_list filter; some values that should have popped up in the filter as the user types did not pop up, not even if the user entered the fully qualified filter value. This made it impossible to select this particular filter.

Technical Summary

Ticket

An issue was raised where a dynamic choice filter option did not appear in the list of options, even when the user typed in the full expected value.

The offending line of code was identified as this one. Adding the code (with some added context) also below:

    if query_context.query:
        query = query.filter(self._sql_column.ilike("%{}%".format(query_context.query)))

    query = query.distinct().order_by(self._sql_column).limit(query_context.limit).offset(query_context.offset)

What happens here is, after the user query is applied to the DB query (with .ilike), the results are being ordered by the sql_column, effectively ordering whatever results is returned alphabetically. In this particular case the qualified user query was in fact also a subset of a bunch of other results. This resulted in a post-sorted query having pushed down the expected result to the bottom of the list and further being cut off after the limit was applied, effectively prohibiting the user to see the desired filter value.

Example of what happened (with fake values):
Assume a column have the following values:

["abc", "abcd", "cab", "cabc", "ab"]

User input query "ab" (this is the fully qualified value, but also a substring of other results)
Assume the query limit is 3

Query results:

>>> query.order_by(self.sql_column)
["abc", "abcd", "cab", "cabc", "ab"]

After limit is applied:

>>> query.order_by(self.sql_column).limit(query_context.limit)
["abc", "abcd", "cab"]

*insert user sad face here

Solution

A more intuitive (I think) way of ordering the results would be to show the results in the following priority manner:

  1. Results that match exactly, if any.
  2. Substring matches, if any.
  3. Everything else.

Taking the above example, the query results would become:

>>> query.order_by(new_query_order_by)
["ab", "abc", "abcd", "cab", "cabc"]

This solution is open for debate.

Note: this PR does not cover the get_values_for_query method of MultiFieldDataSourceColumnChoiceProvider (only for DataSourceColumnChoiceProvider), so a similar issue might still exist with filters having this type of choice provider.

Feature Flag

UCR

Safety Assurance

Safety story

Automated tests to be written + QA to commence on staging.

Automated test coverage

Automated test to be written

QA Plan

QA will be done

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@Charl1996 Charl1996 added the product/feature-flag Change will only affect users who have a specific feature flag enabled label Dec 10, 2024
return query.distinct().order_by(
case([
(self._sql_column.ilike(query_context.query), 0), # Exact matches first
(self._sql_column.ilike(f"{query_context.query}%"), 1), # Prefix matches next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not used case before so I am not fully familiar with the change here.

Just one thing to add, I assume this adds considerable load, so we could just limit to exact match and substring.
Or, is it possible to fetch all matches and then apply some filter, sort & limit in python instead? I wonder if we are making the db do a lot of work in the query and so instead could just do that in python since the db isn't getting any advantage of the limit anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just limit to exact match and substring

Sounds reasonable.

Thanks for the input. I'm slightly hesitant to push the operations into python itself, since DBs are very efficient and optimised for handling calculations over large datasets. Let's see what others say as well.

@mkangia
Copy link
Contributor

mkangia commented Dec 10, 2024

That is a great find @Charl1996 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/feature-flag Change will only affect users who have a specific feature flag enabled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants