Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #1455] Implement logic for query box in the search API #1514

Merged
merged 4 commits into from
Mar 22, 2024

Conversation

chouinar
Copy link
Collaborator

Summary

Fixes #1455

Time to review: 10 mins

Changes proposed

Implements logic for the query field in the search API

Updated the seed script to allow you to generate more opportunities locally

Context for reviewers

This logic is deliberately very, very simple at the moment. It basically just does a text contains check across several columns. This is the most naive approach, and it is expected that if you tried to search two separate words at the same time, you probably won't get any results.

A few options we may consider later (depending on the level of effort we want to put into a non-search-index approach):

Additional information

The ids function I added for the tests was because there are a lot of tests and when test 16 fails, its really hard to figure out which one that is. Now it outputs the request+expected results:
Screenshot 2024-03-21 at 10 47 34 AM

@@ -167,7 +167,7 @@ db-migrate-heads: ## Show migrations marked as a head
$(alembic_cmd) heads $(args)

db-seed-local:
$(PY_RUN_CMD) db-seed-local
$(PY_RUN_CMD) db-seed-local $(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -932,6 +932,8 @@ components:
properties:
query:
type: string
minLength: 1
maxLength: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious / to have context: Was 100 chosen for the limit by any specific criteria? Could we easily adjust it in the future if we discover folx are entering really long term strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem hard to update . If the frontend also has <input min={1} max={100} - we have to keep them in sync

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's no technical/performance/etc limit we need to be mindful of? 100 is an arbitrary choice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit for your search box should be as long as possible, and no longer. For example, if your search engine only supports 256 characters, then don't allow users to enter in 257 characters. You should not make your text limit any shorter than that supported by the engine, because there will always be a small, but possibly significant, number of users that need to search for really long phrases. You should not make it any longer than supported by the search engine, because it will result in unnecessary error messages or unexpected search results.
https://ux.stackexchange.com/questions/112927/character-limit-for-the-search-box-best-pratices

Trying to image how this edge case might happen, I can think of ways:

  • Someone downloaded an opportunity PDF but forgot the URL; they search for a long string copied from the PDF
  • Someone enters a keyword, a title, an ID, a assistance listing number, etc; the whole lot adds up quickly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusting this value takes 2 seconds (change the number in the API schema + a test).

We can make it longer, but of the two scenarios you put there, I think only the first would be valid at the moment. Supplying a keyword AND title AND id will almost certainly never return results as implemented. If you entered HHS health 00.123 that exact order of words would need to appear in at least one field we check against to return results, we aren't tokenizing and searching each individually.

"type": "min_or_max_length",
}
],
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:thumbs-up:

ids=search_scenario_id_fnc,
)
def test_opportunity_query_and_filter_200(
self, client, api_auth_token, search_request, expected_scenarios, setup_scenarios
Copy link
Contributor

@rylew1 rylew1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the setup_scenarios fixture being used? I think there's a few tests in this file that it may not be used in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used by all the tests in this class. It only gets called once for the set of tests for the class, but is required by all of them as it is where we setup the data we actually query against.


self.validate_results(search_request, search_response, expected_scenarios)

def validate_results(self, search_request, search_response, expected_scenarios):
Copy link
Contributor

@rylew1 rylew1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is a bit of a larger ask but would it make sense to organize things by class - as they can be run that way via console? - Or trying to keep files under 1000 lines if possible by splitting them up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are organized by class - what specifically would you suggest be changed?

I do think this set of tests should be tidied up a bit, and splitting the different routes into separate files probably would be the right way to handle that.

Unfortunately, search requires a lot of tests to validate as there are effectively an infinite number of scenarios due to how many different possible values / fields we can search by.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and splitting the different routes into separate files probably would be the right way to handle that.

Yea something like this - I see that some of the parameterized tests use both the singular GET opportunity and plural POST opportunities. It's just early in the project and I would try to avoid the test files ending up like our past project with 5k lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a ticket to do that cleanup: #1524

OpportunitySummary.summary_description.ilike(ilike_query),
# assistance listing number matches exactly or program title partial match
OpportunityAssistanceListing.assistance_listing_number == query,
OpportunityAssistanceListing.program_title.ilike(ilike_query),
Copy link
Contributor

@rylew1 rylew1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this works for now - are you planning to run the query text against opensearch in the future or something?

@andycochran or @crystabelrangel - you might take a peek at what we have here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the structure of this file looks modular and flexible with small functions like _add_query_filters and _add_filters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the search logic is very very simplistic on purpose at the moment. I did spend some time looking into:

The textsearch in postgres seemed to not work much better to be honest, maybe I was doing something wrong but the scores it produced generally just amounted to a 0 or ~1 value indicating it was in the column. It also didn't handle the case where you search HHS health intending that to be the agency + a random text search term.

Text similarity did seem quite a bit better as it allows for a bit of flexibility, but would need to investigate further.

All-in-all, the more time we spend on this, the longer it takes for us to build out an actual search index.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rylew1, not sure what feedback you need from design here. If it's just about what data the terms search against (opportunity_title, opportunity_number, agency, summary_description, assistance_listing_number, program_title seems like more that we even anticipated 👍) I think that's fine. At least for now; we'll learn more as we test.

Copy link
Contributor

@rylew1 rylew1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits but looks good!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 22, 2024
@chouinar chouinar merged commit 2b205af into main Mar 22, 2024
8 checks passed
@chouinar chouinar deleted the chouinar/1455-query-box branch March 22, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api ci/cd documentation Improvements or additions to documentation python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Implement query filtering to the search endpoint
5 participants