25.0.0
These release notes are automatically extracted from the full changelog.
Major changes
- curate format-dates: Raises an error if provided date field does not exist in records. #1509 (@joverlee521)
- All curate subcommands: Verifies all input records have the same fields and raises an error if a record does not have matching fields. #1518 (@joverlee521)
Features
- Added a new sub-command
augur curate apply-geolocation-rules
to apply user curated geolocation rules to the geolocation fields in a metadata file. Previously, this was available as a script within the nextstrain/ingest repo. #1491 (@victorlin) - Added a default color for the "Asia" region that will be used in
augur export
is no custom colors are provided. #1490 (@joverlee521) - Added a new sub-command
augur curate apply-record-annotations
to apply user curated annotations to existing fields in a metadata file. Previously, this was available as amerge-user-metadata
in the nextstrain/ingest repo. #1495 (@joverlee521) - Added a new sub-command
augur curate abbreviate-authors
to abbreviate lists of authors to " et al." Previously, this was avaliable as thetransform-authors
script within the nextstrain/ingest repo. [#1483][] (@genehack) - Added a new sub-command
augur curate parse-genbank-location
to parse thegeo_loc_name
field from GenBank reconds. Previously, this was available as thetranslate-genbank-location
script within the nextstrain/ingest repo. [#1485][] (@genehack) - curate format-dates: Added defaults to
--expected-date-formats
so that ISO 8601 dates (%Y-%m-%d
) and its various masked forms (e.g.%Y-XX-XX
) are automatically parsed by the command. #1501 (@joverlee521) - Added a new sub-command
augur curate transform-strain-name
to filter strain names based on matching a regular expression. Previously, this was available as thetransform-strain-names
script within the nextstrain/ingest repo. #1514 (@genehack) - Added a new sub-command
augur curate rename
to rename field / column names. Previously, a similar version was available as thetransform-field-names
script within the nextstrain/ingest repo however the behaviour is slightly changed here. #1506 (@jameshadfield)
Bug Fixes
- filter: Improve speed of checking duplicates in metadata, especially for large files. #1466 (@victorlin)
- curate: Stop adding double quotes to the metadata TSV output when field values have internal quotes. #1493 (@joverlee521)
- curate format-dates: Mask empty date values as
XXXX-XX-XX
to represent unknown dates. #1509 (@joverlee521)