-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port transform-field-names
to augur curate rename
#1506
Conversation
47e03b9
to
3942969
Compare
3942969
to
a3615ee
Compare
a3615ee
to
1671cdb
Compare
9193774
to
b81f747
Compare
See discussion in PR review for context <#1506 (comment)>
b81f747
to
0a2096d
Compare
See discussion in PR review for context <#1506 (comment)>
0a2096d
to
3e9246e
Compare
augur/curate/rename.py
Outdated
|
||
field_map = [] | ||
for field in field_map_arg: | ||
old_name, new_name = field.split('=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth stripping whitespace off old_name
and new_name
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally glossed over this thinking any sane IFS would take care of it for me, but of course it won't do this when the arguments are quoted. Fixed up in force push and a bunch of tests added here, including checking that we have one and only one "=" character.
Copied from <https://github.com/nextstrain/ingest/blob/c94d78d1f38b99e893007a76526f3d3824ecded0/transform-strain-names> which was itself copied from <https://github.com/nextstrain/mpox/blob/5969604dfe426745b789746427b580c69d484790/ingest/bin/transform-strain-names> Only change in this commit was to remove the shebang and not make the file executable.
Includes changes to make the copied script work as a new Augur subcommand and fit in with the codebase. The functional behaviour of the actual renaming is unchanged, although the I/O options are now expanded as we inherit the general `augur curate` machinery.
See discussion in PR review for context <#1506 (comment)>
3e9246e
to
0357705
Compare
@joverlee521 I think this is good to go now. I added 650bd56 which allows you to drop columns via |
See discussion in PR review for context <#1506 (comment)>
0357705
to
134f384
Compare
Tests describe desired output, not current output
to match expected behaviour in tests. The main changes functional changes are around the order of fields, where we now rename "in-place" rather than adding the renamed column at the end (which for TSV output is the last column). More sanity checks are performed on arguments and they are cross-referenced with the provided records. Note that this relies on each record having the same fields, and this is not asserted here. See <#1510>
See discussion in PR review for context <#1506 (comment)>
134f384
to
c7091ce
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1506 +/- ##
==========================================
+ Coverage 69.69% 69.90% +0.21%
==========================================
Files 74 75 +1
Lines 7827 7882 +55
Branches 1914 1933 +19
==========================================
+ Hits 5455 5510 +55
Misses 2086 2086
Partials 286 286 ☔ View full report in Codecov by Sentry. |
augur/curate/rename.py
Outdated
if not new_name and not force: | ||
raise AugurError(f"The field-map {field!r} doesn't specify a name for the new field." | ||
" If you mean to drop this field you must specify --force.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly worried that this is conflating the --field-map
/--force
flag.
Thoughts on adding an explicit --drop-fields
column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, is the suggestion here to:
- drop the
--force
flag, and - make
--field-map
require both old and new names (and throw an error if new name is missing), and - add an explicit
--drop-fields
option that does what it says on the tin?
If I've got that right, I think it's a sound suggestion — it's more explicit around what it's doing, and more bullet-proof against something like an accidental botched config edit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to the curate world so I won't push against this. How about we just drop 650bd56 which will remove the drop functionality and shift this to a new issue. The resulting situation will require that --field-map
has non-empty new and old fields (see tests). Note that --force
is still used for the situation where we overwrite an existing column name (see tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just drop 650bd56 which will remove the drop functionality and shift this to a new issue.
Sounds good to me. I can write up a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit dropped in force-push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed drop column function in #1526
c7091ce
to
0d0401b
Compare
Following instructions in `DEV_DOCS.md` and the (forthcoming) PR <https://github.com/nextstrain/augur/pull/1527/files>. These docs were missed during development of the rename command <#1506> as that predated the CI checks added in <#1525>
See commit messages for details, and especially the added tests which describe the expected behaviour - most of which is slightly different from the previous
transform-field-names
script.This assumes that NDJSON records have the same fields, so it's probably blocking on #1510, although I can add a simple check for that within the
rename
code in the name of expediency if desired.Closes #1484