Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parameter to augur parse to specify a different record ID for output sequences FASTA #1403

Merged
merged 5 commits into from
Feb 12, 2024

Commits on Feb 12, 2024

  1. Add test for default id columns

    Originally augur parse first checks for 'name' then 'strain' and then the first field
    to figure out the sequence id to use. This test checks the default id column order.
    j23414 committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    4a12746 View commit details
    Browse the repository at this point in the history
  2. Refactor and reuse the metadata default ID columns instead of hardcoding

    During augur parse, instead of hardcoding a check for a 'name' field and
    then a 'strain' field to use as a sequence ID, reuse and check against the
    io.metadata DEFAULT_ID_COLUMNS list to be more consistent with the rest of
    augur.
    
    This will result in a breaking behavior change in that parse is now checking
    for 'strain' before 'name' as the default ID column. However, the argument
    to be more consistent with the rest of augur is that the DEFAULT_ID_COLUMNS
    list is already used in other parts of augur and augur parse should be no
    different.
    j23414 committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    507a947 View commit details
    Browse the repository at this point in the history
  3. Parameterize the record ID field for output sequences

    Include a `--output-id-field` parameter in augur parse to indicate the ID
    field for the output sequences file, such as using "accession" instead of
    "strain".
    
    It is important to note that the `--output-id-field` parameter is not required,
    and augur parse will fall back to the DEFAULT_ID_COLUMNS (e.g. ('strain','name'))
    if it is not present. If none of the DEFAULT_ID_COLUMNS are present in the fields,
    fall back to using the first field.
    
    The `--output-id-field` parameter is designed to accept a single field name,
    not multiple. User are required to provide a `--fields col1 col2 strain accession`
    argument in the same invocation. It seems reasonable to expect the user to
    choose a specific field name for the ID.
    
    To prevent unintended behaviors, if `--output-id-field` is not present in
    `--fields` (e.g., due to a typo), augur parse will raise an error instead of
    falling back to DEFAULT_ID_COLUMNS.
    j23414 committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    f0fbb27 View commit details
    Browse the repository at this point in the history
  4. Add a deprecation warning that the id column will be reordered in the…

    … future.
    
    Instead of a refactor and breaking change from 8dc0694, this commit adds a deprecation warning to the `parse` step that the default `id` field used will be reordered from ('name', 'strain') to ('strain', 'name').
    
    This will give users time to update their scripts and workflows with an `--output-id-field 'name'` before the change is made.
    
    co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
    j23414 and victorlin committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    f13a36d View commit details
    Browse the repository at this point in the history
  5. Update CHANGELOG

    j23414 committed Feb 12, 2024
    Configuration menu
    Copy the full SHA
    9d96cad View commit details
    Browse the repository at this point in the history