[Issue #1455] Implement logic for query box in the search API #1514

chouinar · 2024-03-21T14:50:19Z

Summary

Time to review: 10 mins

Changes proposed

Implements logic for the query field in the search API

Updated the seed script to allow you to generate more opportunities locally

Context for reviewers

This logic is deliberately very, very simple at the moment. It basically just does a text contains check across several columns. This is the most naive approach, and it is expected that if you tried to search two separate words at the same time, you probably won't get any results.

A few options we may consider later (depending on the level of effort we want to put into a non-search-index approach):

Postgres text search / tsvectors: https://www.postgresql.org/docs/current/textsearch.html
Postgres text similarity: https://www.postgresql.org/docs/current/pgtrgm.html

Additional information

The ids function I added for the tests was because there are a lot of tests and when test 16 fails, its really hard to figure out which one that is. Now it outputs the request+expected results:

rylew1 · 2024-03-21T20:42:02Z

api/Makefile

@@ -167,7 +167,7 @@ db-migrate-heads: ## Show migrations marked as a head
 	$(alembic_cmd) heads $(args)

 db-seed-local:
-	$(PY_RUN_CMD) db-seed-local
+	$(PY_RUN_CMD) db-seed-local $(args)


(chore): update the docs to include iterations: https://github.com/HHS/simpler-grants-gov/blob/chouinar/1455-query-box/documentation/api/database/database-local-usage.md?plain=1#L28

rylew1 · 2024-03-21T20:42:12Z

api/openapi.generated.yml

@@ -932,6 +932,8 @@ components:
      properties:
        query:
          type: string
+          minLength: 1
+          maxLength: 100


Just curious / to have context: Was 100 chosen for the limit by any specific criteria? Could we easily adjust it in the future if we discover folx are entering really long term strings?

Doesn't seem hard to update . If the frontend also has <input min={1} max={100} - we have to keep them in sync

But there's no technical/performance/etc limit we need to be mindful of? 100 is an arbitrary choice?

According to @chouinar comment here it appears arbitrary: https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1711038834496529?thread_ts=1710957436.730929&cid=C05TSL64VUH

The limit for your search box should be as long as possible, and no longer. For example, if your search engine only supports 256 characters, then don't allow users to enter in 257 characters. You should not make your text limit any shorter than that supported by the engine, because there will always be a small, but possibly significant, number of users that need to search for really long phrases. You should not make it any longer than supported by the search engine, because it will result in unnecessary error messages or unexpected search results.
— https://ux.stackexchange.com/questions/112927/character-limit-for-the-search-box-best-pratices

Trying to image how this edge case might happen, I can think of ways:

Someone downloaded an opportunity PDF but forgot the URL; they search for a long string copied from the PDF

Someone enters a keyword, a title, an ID, a assistance listing number, etc; the whole lot adds up quickly

Adjusting this value takes 2 seconds (change the number in the API schema + a test).

We can make it longer, but of the two scenarios you put there, I think only the first would be valid at the moment. Supplying a keyword AND title AND id will almost certainly never return results as implemented. If you entered HHS health 00.123 that exact order of words would need to appear in at least one field we check against to return results, we aren't tokenizing and searching each individually.

rylew1 · 2024-03-21T20:52:21Z

api/tests/src/api/opportunities_v0_1/test_opportunity_route.py

+                    "type": "min_or_max_length",
+                }
+            ],
+        ),


:thumbs-up:

rylew1 · 2024-03-21T20:54:33Z

api/tests/src/api/opportunities_v0_1/test_opportunity_route.py

+        ids=search_scenario_id_fnc,
+    )
+    def test_opportunity_query_and_filter_200(
+        self, client, api_auth_token, search_request, expected_scenarios, setup_scenarios


is the setup_scenarios fixture being used? I think there's a few tests in this file that it may not be used in

It is used by all the tests in this class. It only gets called once for the set of tests for the class, but is required by all of them as it is where we setup the data we actually query against.

rylew1 · 2024-03-21T20:56:14Z

api/tests/src/api/opportunities_v0_1/test_opportunity_route.py

+
+        self.validate_results(search_request, search_response, expected_scenarios)
+
+    def validate_results(self, search_request, search_response, expected_scenarios):


nit: This is a bit of a larger ask but would it make sense to organize things by class - as they can be run that way via console? - Or trying to keep files under 1000 lines if possible by splitting them up?

These are organized by class - what specifically would you suggest be changed?

I do think this set of tests should be tidied up a bit, and splitting the different routes into separate files probably would be the right way to handle that.

Unfortunately, search requires a lot of tests to validate as there are effectively an infinite number of scenarios due to how many different possible values / fields we can search by.

and splitting the different routes into separate files probably would be the right way to handle that.

Yea something like this - I see that some of the parameterized tests use both the singular GET opportunity and plural POST opportunities. It's just early in the project and I would try to avoid the test files ending up like our past project with 5k lines.

Made a ticket to do that cleanup: #1524

rylew1 · 2024-03-21T20:57:36Z

api/src/services/opportunities_v0_1/search_opportunities.py

+            OpportunitySummary.summary_description.ilike(ilike_query),
+            # assistance listing number matches exactly or program title partial match
+            OpportunityAssistanceListing.assistance_listing_number == query,
+            OpportunityAssistanceListing.program_title.ilike(ilike_query),


I guess this works for now - are you planning to run the query text against opensearch in the future or something?

@andycochran or @crystabelrangel - you might take a peek at what we have here?

Overall the structure of this file looks modular and flexible with small functions like _add_query_filters and _add_filters

Yes, the search logic is very very simplistic on purpose at the moment. I did spend some time looking into:

Postgres text search / tsvectors: https://www.postgresql.org/docs/current/textsearch.html

Postgres text similarity: https://www.postgresql.org/docs/current/pgtrgm.html

The textsearch in postgres seemed to not work much better to be honest, maybe I was doing something wrong but the scores it produced generally just amounted to a 0 or ~1 value indicating it was in the column. It also didn't handle the case where you search HHS health intending that to be the agency + a random text search term.

Text similarity did seem quite a bit better as it allows for a bit of flexibility, but would need to investigate further.

All-in-all, the more time we spend on this, the longer it takes for us to build out an actual search index.

@rylew1, not sure what feedback you need from design here. If it's just about what data the terms search against (opportunity_title, opportunity_number, agency, summary_description, assistance_listing_number, program_title seems like more that we even anticipated 👍) I think that's fine. At least for now; we'll learn more as we test.

rylew1

A few nits but looks good!

[Issue #1455] Implement logic for query box in the search API

105c0c5

chouinar requested review from acouch, aplybeah and SammySteiner as code owners March 21, 2024 14:50

chouinar requested a review from rylew1 March 21, 2024 14:50

github-actions bot added python api ci/cd labels Mar 21, 2024

Update OpenAPI spec

b991c25

rylew1 reviewed Mar 21, 2024

View reviewed changes

api/tests/src/api/opportunities_v0_1/test_opportunity_route.py

"type": "min_or_max_length",

}

],

),

Copy link

Contributor

rylew1 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:thumbs-up:

rylew1 reviewed Mar 21, 2024

View reviewed changes

rylew1 approved these changes Mar 21, 2024

View reviewed changes

Merge branch 'main' into chouinar/1455-query-box

91cd84a

acouch approved these changes Mar 22, 2024

View reviewed changes

Updating database docs

874b258

chouinar requested review from sumiat, widal001 and andycochran as code owners March 22, 2024 14:35

github-actions bot added the documentation Improvements or additions to documentation label Mar 22, 2024

chouinar merged commit 2b205af into main Mar 22, 2024
8 checks passed

chouinar deleted the chouinar/1455-query-box branch March 22, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #1455] Implement logic for query box in the search API #1514

[Issue #1455] Implement logic for query box in the search API #1514

chouinar commented Mar 21, 2024

rylew1 Mar 21, 2024

acouch Mar 22, 2024

rylew1 Mar 21, 2024

andycochran Mar 22, 2024

rylew1 Mar 22, 2024

andycochran Mar 22, 2024

rylew1 Mar 22, 2024

andycochran Mar 22, 2024

chouinar Mar 22, 2024

rylew1 Mar 21, 2024

rylew1 Mar 21, 2024 •

edited

Loading

chouinar Mar 22, 2024

rylew1 Mar 21, 2024 •

edited

Loading

chouinar Mar 22, 2024

rylew1 Mar 22, 2024

chouinar Mar 22, 2024

rylew1 Mar 21, 2024 •

edited

Loading

rylew1 Mar 21, 2024

chouinar Mar 22, 2024

andycochran Mar 22, 2024

rylew1 left a comment


		self.validate_results(search_request, search_response, expected_scenarios)

		def validate_results(self, search_request, search_response, expected_scenarios):

[Issue #1455] Implement logic for query box in the search API #1514

[Issue #1455] Implement logic for query box in the search API #1514

Conversation

chouinar commented Mar 21, 2024

Summary

Time to review: 10 mins

Changes proposed

Context for reviewers

Additional information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rylew1 Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rylew1 Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rylew1 Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rylew1 left a comment

Choose a reason for hiding this comment

rylew1 Mar 21, 2024 •

edited

Loading

rylew1 Mar 21, 2024 •

edited

Loading

rylew1 Mar 21, 2024 •

edited

Loading