Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port apply-geolocation-rules to augur curate #1491

Merged
merged 7 commits into from
Jun 26, 2024

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Jun 26, 2024

Description of proposed changes

Porting over from nextstrain/ingest with minimal functional changes.

Related issue(s)

Checklist

  • Tests ported from old PRs
  • Other tests added
  • Checks pass
  • If making user-facing changes, add a message in CHANGES.md summarizing the changes in this PR

@victorlin victorlin self-assigned this Jun 26, 2024
@victorlin victorlin force-pushed the victorlin/add-apply-geolocation-rules branch 2 times, most recently from 87053d9 to 6f67295 Compare June 26, 2024 19:29
@victorlin victorlin marked this pull request as ready for review June 26, 2024 19:45
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up @victorlin! I added another test in b80bc4e to make sure I'm not crazy for remembering that we can override default rules.

augur/curate/apply_geolocation_rules.py Show resolved Hide resolved
pass


def load_geolocation_rules(geolocation_rules_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly question #2: do we eschew types in augur?

Copy link
Member Author

@victorlin victorlin Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, you'll find some here and there in two forms:

  1. Types in docstrings

    metadata_file : str
    Path to a metadata file to load.
    delimiters : list of str
    List of possible delimiters to check for between columns in the metadata.
    Only one delimiter will be inferred.
    columns : list of str
    List of columns to read. If unspecified, read all columns.

  2. Type annotations

    def _calculate_sequences_per_group(
    target_max_value: int,
    group_sizes: Collection[int]
    ) -> int:

These are picked up and rendered in docs such as augur.io.read_metadata:

augur.io.read_metadata

But typing isn't ubiquitous and what's typed isn't even validated.

Historically most of the codebase has been untyped and we as a team have only recently begun using types in untyped languages. It hasn't been a priority to go back and add more types to existing code, but that's something I would love to see happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the codebase has been untyped and we as a team have only recently begun using types in untyped languages. It hasn't been a priority to go back and add more types to existing code, but that's something I would love to see happen.

one way to expand the use of types would be to have an general expectation that any new code being added would (at a minimum) annotate method signatures...

}
"""
geolocation_rules = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
with open(geolocation_rules_file, 'r') as rules_fh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm gonna have a bunch of silly questions i think: is there a pattern in augur around who is responsible for handling (or, i guess, not handling) exceptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see #642.

Keep the questions coming! It's good to know what seems weird from a fresh perspective. Also this comment prompted me go back and update that outdated issue.

victorlin and others added 7 commits June 26, 2024 15:27
Removed the executable permission and shebang since this file is no
longer expected to be called directly.

Copied from: <https://github.com/nextstrain/ingest/blob/c94d78d1f38b99e893007a76526f3d3824ecded0/apply-geolocation-rules>
Added the minimum required changes to allow this subcommand to work:

- renamed the file to work as a python module (i.e. replace hyphens
  with underscores, add .py extension)

- added it as a subcommand of augur curate (after titlecase to be
  consistent with the order used in existing workflows)

- replaced the main entrypoint with register_parser() + run()

- hooked up to augur curate's argparse parser

- updated to work with JSON records directly (i.e. letting augur curate
  handle stdin/stdout)
This does not test all functionality, but is better than no testing.
To be consistent with the rest of the codebase.
@victorlin victorlin force-pushed the victorlin/add-apply-geolocation-rules branch from b80bc4e to 295f280 Compare June 26, 2024 22:27
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 95.40230% with 4 lines in your changes missing coverage. Please review.

Project coverage is 69.15%. Comparing base (ff5f880) to head (295f280).
Report is 275 commits behind head on master.

Files with missing lines Patch % Lines
augur/curate/apply_geolocation_rules.py 95.34% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
+ Coverage   68.86%   69.15%   +0.29%     
==========================================
  Files          69       70       +1     
  Lines        7608     7694      +86     
  Branches     1861     1887      +26     
==========================================
+ Hits         5239     5321      +82     
- Misses       2086     2087       +1     
- Partials      283      286       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@victorlin victorlin merged commit b842616 into master Jun 26, 2024
20 checks passed
@victorlin victorlin deleted the victorlin/add-apply-geolocation-rules branch June 26, 2024 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants