Add a parameter to augur parse to specify a different record ID for output sequences FASTA #1403

Originally augur parse first checks for 'name' then 'strain' and then the first field to figure out the sequence id to use. This test checks the default id column order.

During augur parse, instead of hardcoding a check for a 'name' field and then a 'strain' field to use as a sequence ID, reuse and check against the io.metadata DEFAULT_ID_COLUMNS list to be more consistent with the rest of augur. This will result in a breaking behavior change in that parse is now checking for 'strain' before 'name' as the default ID column. However, the argument to be more consistent with the rest of augur is that the DEFAULT_ID_COLUMNS list is already used in other parts of augur and augur parse should be no different.

Include a `--output-id-field` parameter in augur parse to indicate the ID field for the output sequences file, such as using "accession" instead of "strain". It is important to note that the `--output-id-field` parameter is not required, and augur parse will fall back to the DEFAULT_ID_COLUMNS (e.g. ('strain','name')) if it is not present. If none of the DEFAULT_ID_COLUMNS are present in the fields, fall back to using the first field. The `--output-id-field` parameter is designed to accept a single field name, not multiple. User are required to provide a `--fields col1 col2 strain accession` argument in the same invocation. It seems reasonable to expect the user to choose a specific field name for the ID. To prevent unintended behaviors, if `--output-id-field` is not present in `--fields` (e.g., due to a typo), augur parse will raise an error instead of falling back to DEFAULT_ID_COLUMNS.

… future. Instead of a refactor and breaking change from 8dc0694, this commit adds a deprecation warning to the `parse` step that the default `id` field used will be reordered from ('name', 'strain') to ('strain', 'name'). This will give users time to update their scripts and workflows with an `--output-id-field 'name'` before the change is made. co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a parameter to augur parse to specify a different record ID for output sequences FASTA #1403

Add a parameter to augur parse to specify a different record ID for output sequences FASTA #1403

Commits on Feb 12, 2024