-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Curate format dates #1146
Merged
Merged
Curate format dates #1146
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
joverlee521
commented
Jan 31, 2023
victorlin
reviewed
Feb 1, 2023
victorlin
reviewed
Feb 1, 2023
victorlin
reviewed
Feb 1, 2023
victorlin
reviewed
Feb 1, 2023
victorlin
reviewed
Feb 1, 2023
Overall this LGTM, modulo existing comments. |
joverlee521
force-pushed
the
curate-format-dates
branch
from
June 12, 2023 21:04
c7c79fd
to
701eec9
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1146 +/- ##
==========================================
+ Coverage 68.95% 69.25% +0.30%
==========================================
Files 64 66 +2
Lines 6951 7033 +82
Branches 1697 1711 +14
==========================================
+ Hits 4793 4871 +78
- Misses 1853 1855 +2
- Partials 305 307 +2
☔ View full report in Codecov by Sentry. |
This was referenced Jul 6, 2023
Copied the `transform-date-fields` script from monekypox/ingest¹ as the base of the new subcommand `augur curate format-dates`. Copied unmodified for now so that we can see modifications in version control history. ¹ https://github.com/nextstrain/monkeypox/blob/8cc4cf739b9af679e1d2fe29d9f702f42f60e636/ingest/bin/transform-date-fields
No changes to the internals of the script, just edits to fit into the Augur ecosystem: * add `register_parser` and `run` functions * remove shebang * docstring edits * move nested function out for additional doctests * Cram tests for the new subcommand
Changes the cascade of directive checks to explicit sets of tuples of directives so that it is easier to read/understand as suggested by @tsibley in a previous review.¹ This also allows us to simplify the `directive_is_included` function to only expect tuples as potential directives.² ¹ nextstrain/mpox#45 (comment) ² nextstrain/mpox#45 (comment)
Use `itertools.product` to more clearly see the possible values for each part, and put these constants in a separate file to avoid setting the same values upon every call to `format_date()`. Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
'%%' represents an escaped directive that is a literal '%' character, so it needs to be excluded in the date format check. Using the regex negative lookbehind assertion suggested by @tsibley in a previous review.¹ ¹ nextstrain/mpox#45 (comment)
Add `--failure-reporting` option to dictate how failed date formatting should be reported. Similar to the duplicate/unmatched-reporting options for curate inputs, the choices are `error_first`, `error_all`, `warn`, or `silent`. The `error_all` and `warn` options print summaries for all records that failed as `(record, field, date string)`. I chose this format to minimize the memory footprint while still providing enough info for users to debug issues.
Now that `DataErrorMethod` is an `ArgparseEnum`, we no longer have to specify `value` for the argparse choices and we do not have to convert the argument strings back to enum members. Specify type as `DataErrorMethod.argtype` in order to get better error messages when the provided option value is not a valid choice.
By default, completely mask date strings with "XXXX-XX-XX" for dates that failed date formatting so that they are still in the proper ISO 8601 dates format for downstream Augur commands. Users can turn off the masking with the added `--no-mask-failure` option. Note the `store_false` actions produce semi-misleading docs in the help output, so the new option suppresses the default value in the help message with `SKIP_AUTO_DEFAULT_IN_HELP` as suggested by @tsibley in review.
joverlee521
force-pushed
the
curate-format-dates
branch
from
July 7, 2023 00:15
701eec9
to
b4d390d
Compare
Rebased onto master to pull in the latest changelog. I'll merge this tomorrow once all of the CI tests pass. |
victorlin
approved these changes
Jul 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Adds a new sub-command
augur curate format-dates
based on the transform-date-fields script in monkeypox repo.See commits for details.
Related issue(s)
Closes #1001
Related to #860
Testing
Added Cram test for the new sub-command and includes doctests.
Checklist