Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namex API - implement NR search API #16714

Closed
severinbeauvais opened this issue Jun 12, 2023 · 36 comments
Closed

Namex API - implement NR search API #16714

severinbeauvais opened this issue Jun 12, 2023 · 36 comments
Assignees
Labels
NameX NameX and related services proxied via namex Priority1 SRE SRE team task

Comments

@severinbeauvais
Copy link
Collaborator

severinbeauvais commented Jun 12, 2023

Similar to Business Search API but for NRs -- support search by NR number or business name.

Need to return:

  • all active NRs that match query string
  • recently made inactive in last 60 (?) days (expired, consumed, etc)
    • in case an action can be made on the NR (eg, renew)
  • across all accounts
  • only un-affiliated NRs? otherwise we need to implement multiple affiliations (Auth API - Add support for NR affiliated to multiple accounts? #16718)
  • all types of NRs (name change, cont in, etc, etc)

DO NOT return NR "auth keys" (email / phone) in search response. (This would allow anyone to do a search and then be able to access any NR.)

Based on discussions below and in RocketChat, it appears that the best way to implement this is to use SOLR.

Sample search results for a NR number (or business name):

image.png

@severinbeauvais severinbeauvais added the ENTITY Business Team label Jun 12, 2023
@severinbeauvais
Copy link
Collaborator Author

@seeker25 what can you add to this conversation?

@severinbeauvais severinbeauvais added the NameX NameX and related services proxied via namex label Jun 12, 2023
@severinbeauvais
Copy link
Collaborator Author

@davemck513 @steveburtch Is this a ticket for Entities Team or another team?

This is needed for the NR search on the new My Business Registry page.

@severinbeauvais
Copy link
Collaborator Author

@Mihai-QuickSilverDev Please identify the "60 day" rules (or whatever they are) for which NRs should be returned in the search results.

@severinbeauvais
Copy link
Collaborator Author

Should NRs affiliated to other accounts be returned in the search results? See also #16718

@seeker25
Copy link
Collaborator

seeker25 commented Jun 13, 2023

https://github.com/bcgov/namex/blob/c92d6484a63b5407aea5ea31bbeb8bb060d99f15/api/namex/resources/requests.py#L398 <-- I'd modify that route so it takes more than just identifiers? Requires System and returns phone number and email - could change this

@severinbeauvais
Copy link
Collaborator Author

returns phone number and email

We need to be careful what calls return this data, since it is the key to accessing/affiliating/using a NR.

For sure the UI should not compare the email/phone provided by the user with data from the API; this check should be done in the API somehow.

@seeker25
Copy link
Collaborator

returns phone number and email

We need to be careful what calls return this data, since it is the key to accessing/affiliating/using a NR.

For sure the UI should not compare the email/phone provided by the user with data from the API; this check should be done in the API somehow.

Yup, which is why it's SYSTEM only right now and is called via AUTH-API (which checks the affiliations)

We should probably be caching the SYSTEM bearer token fetching as well

@severinbeauvais
Copy link
Collaborator Author

So far, we have the following unanswered questions:

  1. Make a fast database search?
  2. Or make a new SOLR query for NR num / name?
  3. Should NRs affiliated to other accounts be returned in the search results?
  4. How much effort is this?

@seeker25
Copy link
Collaborator

seeker25 commented Jun 13, 2023

@kialj876 could probably answer
1, 2, 4

@seeker25
Copy link
Collaborator

seeker25 commented Jun 13, 2023

After talking with Kial, probably a good idea to use business search for the business search UX (it uses SOLR already I believe).

EG.
GET https://bcregistry-test.apigee.net/registry-search/api/v1/businesses/search/facets?query=value:Hello&start=0&rows=100

query=value:<user input> param and it will match on name/identifier/tax id.

For the names side, probably best to just be consistent? There is already a component of SOLR inside of namex.

From Kial:

"The component they built in MHR thats connected to business search looks the exact same as what was in this design except for the 'added' enhancements so I would just take that and tweak it"

@severinbeauvais
Copy link
Collaborator Author

Kial said:

  • Auth Web should call a new Namex API endpoint (with a new namex solr call for this)
  • might want to prevent 3rd party apps from being able to use the namex endpoint -- using KC token?

@severinbeauvais severinbeauvais changed the title NR API - implement NR search API Namex API - implement NR search API Jun 13, 2023
@jdyck-fw
Copy link
Collaborator

Consider this ticket a time-box of an estimate. If it requires a greater deal of work than what feels natural for a 5 point ticket, we agreed that it could result in another ticket to complete the work.

@kialj876
Copy link
Collaborator

kialj876 commented Jun 22, 2023

For whoever takes this ticket you can reach out to me and I can spend a half hour showing you what's already there for the namex / solr connection and what you need to add in a bit more detail if you want

@kialj876
Copy link
Collaborator

I'm not sure if we need to add any auth check or block 3rd party api users from using this endpoint as it is fully public I think? That's more of a business question

@seeker25
Copy link
Collaborator

Kial should be on every team Hahaha

@severinbeauvais
Copy link
Collaborator Author

@davemck513 @Mihai-QuickSilverDev Should the "NR Search" API endpoint be fully public (no authentication/authorization), just like the Business Search endpoint (which nevertheless requires an API key) ?

@seeker25
Copy link
Collaborator

seeker25 commented Jun 22, 2023

@pwei1018 / @Oge01 (hopefully this is the right one) -^

@Mihai-QuickSilverDev
Copy link
Collaborator

@severinbeauvais What is your opinion, from a technical pov?

@Mihai-QuickSilverDev
Copy link
Collaborator

In the "My Registry Dashboard" this search would be accessible by account login. If we are going to use the same search mechanism in the open Names page, then it should be wide open. I guess a design question for @janisrogers

@severinbeauvais
Copy link
Collaborator Author

What is your opinion, from a technical pov?

In the "My Registry Dashboard" this search would be accessible by account login. If we are going to use the same search mechanism in the open Names page, then it should be wide open.

I'm wary of "open" endpoints, because anyone out there could access the data freely. Is that OK for business data? For NR data?

Mihai, you're right, the business search will be on the MBR page, which you only get to after login. But the Business Search API endpoint is apparently open already (need an API key but it's passed as clear text in network message), and we're talking about making the NR Search API similarly open.

David, your feelings on this?

@kialj876
Copy link
Collaborator

kialj876 commented Jun 22, 2023

I mean anyone can make an account and access these services. If anyone wanted to they could create an account / build a screen scraper to take all of the data at any given point (without needing any API key info etc.). Verifying the token doesn't provide any real security unless we are actually looking to prevent a group of users in our applications from accessing the data

@severinbeauvais
Copy link
Collaborator Author

severinbeauvais commented Jun 22, 2023

^^ Yup, good points.

If we have data that we don't want an external party to have open access to then let's have that conversation, but it doesn't change THIS ticket for now.

@jdyck-fw jdyck-fw added the SRE SRE team task label Jun 26, 2023
@severinbeauvais
Copy link
Collaborator Author

@Mihai-QuickSilverDev @davemck513

I had a chat with Vysakh and we have some follow-on questions and possible issues...

  1. What NR statuses do we want to search for? New? Pending? Note that SOLR only has approved and rejected ones.

  2. Do we really need a partial NR number search? That is not available in SOLR at the moment. (Full NR Number match does exist.)

Should we look into updating SOLR? Should we do a two-level serach -- use SOLR for the first part of the search, and then "join in" the rest of the NR data from namex db? Should we implement this in namex db only (no SOLR)? Performance will likely be an issue, though we could return fewer results (eg, 10 instead of 20) to make things faster.

cc: @kialj876 @thorwolpert

@Mihai-QuickSilverDev
Copy link
Collaborator

Priority of NR searches:

  1. Draft
  2. Active (approved) - not consumed
  3. Expired within the last 60 days - not consumed
  4. Consumed

@jdyck-fw
Copy link
Collaborator

jdyck-fw commented Jul 13, 2023

@ozamani9gh @ozamani9 - Have a release that we can tag this ticket to?

SB says: ^^ this is obsolete; we still need to implement this

@jdyck-fw jdyck-fw removed the ENTITY Business Team label Jul 25, 2023
@jdyck-fw
Copy link
Collaborator

jdyck-fw commented Jul 25, 2023

@PCC199 - Ready for SRE release

SB says: ^^ this is obsolete; we still need to implement this

@jdyck-fw
Copy link
Collaborator

@davemck513 Will follow up on this.

@jdyck-fw jdyck-fw removed the ENTITY Business Team label Aug 22, 2023
@jdyck-fw
Copy link
Collaborator

@PCC199 @pwei1018 - This is ready for release, though there will be some extra SOLR work required for full functionality. (ticket to be linked here by @Mihai-QuickSilverDev )

There is currently no resource available to do this SOLR work.

Possibly as a next phase. Likely best to group them together in a release.

@seeker25
Copy link
Collaborator

seeker25 commented Aug 23, 2023

We had a discussion about SOLR with @kialj876 & @vysakh-menon-aot - Kial didn't think it was worth doing in SOLR because it was going to be tossed after Namex gets rebuilt.

@kialj876 maybe you could comment - even if we had a resource available to do the SOLR work - probably not worth it?

@seeker25
Copy link
Collaborator

seeker25 commented Aug 23, 2023

I also think this is possible to accomplish without using SOLR, at least in the short term.

I remember working on BC Assessment's front page search - we didn't use anything external like SOLR to do a like search over 2,160,828+ property addresses in BC. You can try it yourself: https://www.bcassessment.ca/

I looked at my old notes, we used this feature for Oracle to build a "catsearch" - https://docs.oracle.com/cd/A91034_01/DOC/text.901/a90121/csql3.htm
with specialized indexes.

"INDEXTYPE IS CTXSYS.CTXCAT is Oracle's way of defining a "full text index" on a column -- as opposed to an ordinary B-tree."

I think Postgres probably has some sort of similar functionality. Postgres uses GIN indexes - GIN indexes are usually used to for PostgreSQL Full Text search (FTS). More info in here probably: https://www.postgresql.org/docs/current/textsearch.html

Rough working implementation here:
https://dev.to/____marcell/fast-fulltext-search-with-postgres-gin-index-22n5

@thorwolpert
Copy link
Collaborator

Glad you liked the Oracle Search, some very excellent c-code there. Having been on that team I know it well. We use some advanced searching in PPR on a much larger set.

We use SOLR as that allows us to cross systems without having the source data locally and because John's team was/is using it.

Aside from that, we probably don't want to spend much time on any Names related work as it is going to be completely over-hauled.

@severinbeauvais
Copy link
Collaborator Author

severinbeauvais commented Aug 23, 2023

So, we need the ability to do a NR lookup that returns all the states we care about, which the existing SOLR search doesn't do. What option do you recommend?

  1. extend existing SOLR
  2. use local SQL searching
  3. implement Oracle Search

@seeker25
Copy link
Collaborator

seeker25 commented Aug 23, 2023

I'd say go with 2 - if possible

It might be possible to create a few GIN indexes - and experiment with similar data size as prod - see if we can get the performance of the existing database search working sub second hopefully.

3 - We don't use Oracle, we're using postgres for the namex database
1 - Don't want to spend time on names work

When Names eventually gets overhauled - we can swap the database search with SOLR search.

@thorwolpert
Copy link
Collaborator

thorwolpert commented Aug 23, 2023

You're not doing any interesting text searching, and there's no way we'd stand up Oracle with all the work being done to move off of it.
As a tactical take, what's wrong with just a SQL query in postgres? There's no type-ahead or anything??

When we migrate names to the new SOLR setup, we can change it.
^^ agree with Travis on point 1 there

@jdyck-fw
Copy link
Collaborator

"We aren't going to include expired or consumed at this point, it will be a post MVP thing." - Dave

Currently we have rejected and approved states from SOLR. From the SQL query, we have draft. NRs in other states cannot be searched and therefore not affiliated.

This ticket is indeed now ready for release for MVP.

@seeker25
Copy link
Collaborator

seeker25 commented Aug 23, 2023

Sounds good, tried it on an branch, it's fast already (probably don't need the GIN indexes unless you're including consumed):

https://bcregistry-account-dev--pr-2469-7dx7pg9r.web.app/account/3040/business

DEV (all states):

namex=# select count(*) from requests;
 count
--------
 439167
(1 row)

namex=# select count(*) from names;
 count
--------
 560177
(1 row)

DEV:

namex=# select count(*) from requests where state_cd in ('INPROGRESS','DRAFT', 'REFUND_REQUESTED');
 count
-------
 15992
(1 row)

PROD:

namex=# select count(*) from requests where state_cd in ('INPROGRESS','DRAFT', 'REFUND_REQUESTED');
 count 
-------
 11668
(1 row)

expired is fairly small too I believe

image

@ozamani9gh ozamani9gh self-assigned this Sep 13, 2023
@PCC199 PCC199 closed this as completed Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NameX NameX and related services proxied via namex Priority1 SRE SRE team task
Projects
None yet
Development

No branches or pull requests

10 participants