Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curate format dates #1146

Merged
merged 12 commits into from
Jul 7, 2023
Merged

Curate format dates #1146

merged 12 commits into from
Jul 7, 2023

Conversation

joverlee521
Copy link
Contributor

Description of proposed changes

Adds a new sub-command augur curate format-dates based on the transform-date-fields script in monkeypox repo.
See commits for details.

Related issue(s)

Closes #1001
Related to #860

Testing

Added Cram test for the new sub-command and includes doctests.

Checklist

  • Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

@joverlee521 joverlee521 requested a review from a team January 31, 2023 22:51
augur/curate/format_dates.py Outdated Show resolved Hide resolved
augur/curate/format_dates.py Show resolved Hide resolved
augur/curate/format_dates.py Show resolved Hide resolved
augur/curate/format_dates.py Show resolved Hide resolved
@tsibley
Copy link
Member

tsibley commented Apr 24, 2023

Overall this LGTM, modulo existing comments.

@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Patch coverage: 95.23% and project coverage change: +0.30 🎉

Comparison is base (edf06e6) 68.95% compared to head (b4d390d) 69.25%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1146      +/-   ##
==========================================
+ Coverage   68.95%   69.25%   +0.30%     
==========================================
  Files          64       66       +2     
  Lines        6951     7033      +82     
  Branches     1697     1711      +14     
==========================================
+ Hits         4793     4871      +78     
- Misses       1853     1855       +2     
- Partials      305      307       +2     
Impacted Files Coverage Δ
augur/curate/format_dates.py 94.36% <94.36%> (ø)
augur/argparse_.py 80.00% <100.00%> (+0.83%) ⬆️
augur/curate/__init__.py 88.88% <100.00%> (ø)
augur/curate/format_dates_directives.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

joverlee521 and others added 12 commits July 6, 2023 17:06
Copied the `transform-date-fields` script from monekypox/ingest¹ as the
base of the new subcommand `augur curate format-dates`.

Copied unmodified for now so that we can see modifications in version
control history.

¹ https://github.com/nextstrain/monkeypox/blob/8cc4cf739b9af679e1d2fe29d9f702f42f60e636/ingest/bin/transform-date-fields
No changes to the internals of the script, just edits to fit into the
Augur ecosystem:

* add `register_parser` and `run` functions
* remove shebang
* docstring edits
* move nested function out for additional doctests
* Cram tests for the new subcommand
Changes the cascade of directive checks to explicit sets of tuples of
directives so that it is easier to read/understand as suggested by
@tsibley in a previous review.¹ This also allows us to simplify the
`directive_is_included` function to only expect tuples as potential
directives.²

¹ nextstrain/mpox#45 (comment)
² nextstrain/mpox#45 (comment)
Use `itertools.product` to more clearly see the possible values for each
part, and put these constants in a separate file to avoid setting the
same values upon every call to `format_date()`.

Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
'%%' represents an escaped directive that is a literal '%' character,
so it needs to be excluded in the date format check. Using the regex
negative lookbehind assertion suggested by @tsibley in a
previous review.¹

¹ nextstrain/mpox#45 (comment)
Add `--failure-reporting` option to dictate how failed date formatting
should be reported. Similar to the duplicate/unmatched-reporting options
for curate inputs, the choices are `error_first`, `error_all`, `warn`,
or `silent`.

The `error_all` and `warn` options print summaries for all records
that failed as `(record, field, date string)`. I chose this format to
minimize the memory footprint while still providing enough info for
users to debug issues.
Now that `DataErrorMethod` is an `ArgparseEnum`, we no longer have to
specify `value` for the argparse choices and we do not have to convert
the argument strings back to enum members.

Specify type as `DataErrorMethod.argtype` in order to get better
error messages when the provided option value is not a valid choice.
By default, completely mask date strings with "XXXX-XX-XX" for dates
that failed date formatting so that they are still in the proper
ISO 8601 dates format for downstream Augur commands. Users can turn off
the masking with the added `--no-mask-failure` option.

Note the `store_false` actions produce semi-misleading docs in the help
output, so the new option suppresses the default value in the help message
with `SKIP_AUTO_DEFAULT_IN_HELP` as suggested by @tsibley in review.
@joverlee521
Copy link
Contributor Author

Rebased onto master to pull in the latest changelog.

I'll merge this tomorrow once all of the CI tests pass.

@joverlee521 joverlee521 merged commit b03442c into master Jul 7, 2023
@joverlee521 joverlee521 deleted the curate-format-dates branch July 7, 2023 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

augur curate format-dates
3 participants