Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest/apply-geolocation-rules: update general rules logic #55

Merged
merged 3 commits into from
Jun 20, 2022

Conversation

joverlee521
Copy link
Contributor

Previously, we would backfill rule_traversal with "*" general rules
if we could not find a matching rule for a particular raw_geolocation.
This would NOT work for cases where the "*" general rules preceded
actual geolocations such as "*/*/*/D". I realized the flaw in this logic
while responding to @victorlin's post-merge review:
#41 (comment)

This commit updates the logic to reset rule_traversal to the last
index that is currently not a "*" general rule, then change this value
to a "*". This allows the recursive function to try different iterations
of rule_traversal with different combinations of raw values and "*"s.

Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wrapping my head around this, but first sending in a note that this doesn't work in some cases where it used to.

ingest/bin/apply-geolocation-rules Outdated Show resolved Hide resolved
ingest/bin/apply-geolocation-rules Show resolved Hide resolved
Previously, we would backfill `rule_traversal` with wildcards ("*")
if we could not find a matching rule for a particular `raw_geolocation`.
This would NOT work for cases where there are partial rule matches AND
wildcard rules that match. I realized the flaw in this logic
while responding to @victorlin's post-merge review:
#41 (comment)

This commit updates the logic for when there are no matching rules. The
`rule_traversal` is reset to the last index that is currently not a
wildcard rule, and then change this value to a wildcard. This allows the
recursive function to try different iterations of `rule_traversal` with
different combinations of raw values and wildcards.
Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the explanation and tests!

ingest/bin/apply-geolocation-rules Outdated Show resolved Hide resolved
Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
@corneliusroemer
Copy link
Member

Anything still blocking this from being merged? @joverlee521 @victorlin?

@corneliusroemer
Copy link
Member

There is one difference after running this, Ciudad Autonoma de Buenos Aires turns into Buenos Aires, I think that's on purpose, so I'll merge.
@joverlee521 please check that this is as expected.

258c258
< ON720962      ON720962.1      MPV-ARG002-2022 2022-06-XX      South America   Argentina       Buenos Aires            Homo sapiens    2022-06-09                           Lewis et al      Lewis,A., Josiowicz,A., Hirmas Riade,S.M., Tous,M., Cisterna,D.M.
---
> ON720962      ON720962.1      MPV-ARG002-2022 2022-06-XX      South America   Argentina       Ciudad Autonoma de Buenos Aires         Homo sapiens    2022-06-09           Lewis et al      Lewis,A., Josiowicz,A., Hirmas Riade,S.M., Tous,M., Cisterna,D.M.

@corneliusroemer corneliusroemer merged commit d189b45 into master Jun 20, 2022
@corneliusroemer corneliusroemer deleted the apply-geolocation-update branch June 20, 2022 16:51
@joverlee521
Copy link
Contributor Author

There is one difference after running this, Ciudad Autonoma de Buenos Aires turns into Buenos Aires, I think that's on purpose, so I'll merge. @joverlee521 please check that this is as expected.

Thanks for checking in on this @corneliusroemer! Yes, this is an expected change of this PR since the updated logic now allows for these partial match rules to work:

*/*/Ciudad Autonoma de Buenos Aires/*\t*/*/Buenos Aires/*

(This is a rule pulled from ncov-ingest)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants