Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port transform-field-names to augur curate rename #1506

Merged
merged 6 commits into from
Jul 9, 2024

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jul 1, 2024

See commit messages for details, and especially the added tests which describe the expected behaviour - most of which is slightly different from the previous transform-field-names script.

This assumes that NDJSON records have the same fields, so it's probably blocking on #1510, although I can add a simple check for that within the rename code in the name of expediency if desired.

Closes #1484

@jameshadfield jameshadfield force-pushed the james/curate-rename branch 2 times, most recently from 9193774 to b81f747 Compare July 1, 2024 23:39
jameshadfield added a commit that referenced this pull request Jul 2, 2024
See discussion in PR review for context
<#1506 (comment)>
jameshadfield added a commit that referenced this pull request Jul 2, 2024
See discussion in PR review for context
<#1506 (comment)>
augur/curate/rename.py Outdated Show resolved Hide resolved

field_map = []
for field in field_map_arg:
old_name, new_name = field.split('=')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth stripping whitespace off old_name and new_name here?

Copy link
Member Author

@jameshadfield jameshadfield Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally glossed over this thinking any sane IFS would take care of it for me, but of course it won't do this when the arguments are quoted. Fixed up in force push and a bunch of tests added here, including checking that we have one and only one "=" character.

augur/curate/rename.py Show resolved Hide resolved
Includes changes to make the copied script work as a new Augur
subcommand and fit in with the codebase. The functional behaviour of the
actual renaming is unchanged, although the I/O options are now expanded
as we inherit the general `augur curate` machinery.
jameshadfield added a commit that referenced this pull request Jul 4, 2024
See discussion in PR review for context
<#1506 (comment)>
@jameshadfield
Copy link
Member Author

jameshadfield commented Jul 4, 2024

@joverlee521 I think this is good to go now. I added 650bd56 which allows you to drop columns via --field-map X= as well.

jameshadfield added a commit that referenced this pull request Jul 4, 2024
See discussion in PR review for context
<#1506 (comment)>
Tests describe desired output, not current output
to match expected behaviour in tests.

The main changes functional changes are around the order of fields,
where we now rename "in-place" rather than adding the renamed column
at the end (which for TSV output is the last column).

More sanity checks are performed on arguments and they are
cross-referenced with the provided records.

Note that this relies on each record having the same fields, and this is
not asserted here. See <#1510>
See discussion in PR review for context
<#1506 (comment)>
Copy link

codecov bot commented Jul 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.90%. Comparing base (b69444a) to head (0d0401b).
Report is 222 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1506      +/-   ##
==========================================
+ Coverage   69.69%   69.90%   +0.21%     
==========================================
  Files          74       75       +1     
  Lines        7827     7882      +55     
  Branches     1914     1933      +19     
==========================================
+ Hits         5455     5510      +55     
  Misses       2086     2086              
  Partials      286      286              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 44 to 46
if not new_name and not force:
raise AugurError(f"The field-map {field!r} doesn't specify a name for the new field."
" If you mean to drop this field you must specify --force.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly worried that this is conflating the --field-map/--force flag.

Thoughts on adding an explicit --drop-fields column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, is the suggestion here to:

  1. drop the --force flag, and
  2. make --field-map require both old and new names (and throw an error if new name is missing), and
  3. add an explicit --drop-fields option that does what it says on the tin?

If I've got that right, I think it's a sound suggestion — it's more explicit around what it's doing, and more bullet-proof against something like an accidental botched config edit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to the curate world so I won't push against this. How about we just drop 650bd56 which will remove the drop functionality and shift this to a new issue. The resulting situation will require that --field-map has non-empty new and old fields (see tests). Note that --force is still used for the situation where we overwrite an existing column name (see tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just drop 650bd56 which will remove the drop functionality and shift this to a new issue.

Sounds good to me. I can write up a separate issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit dropped in force-push

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed drop column function in #1526

@jameshadfield jameshadfield merged commit 18e07b4 into master Jul 9, 2024
20 checks passed
@jameshadfield jameshadfield deleted the james/curate-rename branch July 9, 2024 05:27
jameshadfield added a commit that referenced this pull request Jul 10, 2024
Following instructions in `DEV_DOCS.md` and the (forthcoming) PR
<https://github.com/nextstrain/augur/pull/1527/files>.

These docs were missed during development of the rename command
<#1506> as that predated the CI
checks added in <#1525>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

augur curate rename
3 participants