-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix annotate metadata with numeric strains #952
base: master
Are you sure you want to change the base?
Conversation
CI fails on the "push" because this PR branches from the workflow with the old Nextalign/Nextclade commands, but the CI runs with the latest Docker image using new Nextalign/Nextclade:
CI passes on the "pull" which runs the results we'd see after merging this PR into
Given that the latter passes and is the check that matters, we should be safe to merge. |
Adds a functional test to cover a use case described in #948.
Sets the dtype of the strain column in the sequence index to "string" prior to annotating metadata with that index. This change prevents pandas from inferring the dtype as numeric when strain names are all numeric. Fixes #948
Replaces "merge" between metadata and sequence index with a "join" that uses the index of the input data frames to merge their content. This change allows us to support metadata where strains are indexed by a "name" column instead of a "strain" column. The sequence index will always have a "strain" column (by design), so we load the sequence index and tell pandas the name of its column to index by. This additional change allows the "join" command to work as expected without ever needing to specify the name of the strain column in the metadata. This commit adds a functional test for this expected behavior.
17c75ff
to
bdc1775
Compare
Description of proposed changes
Fix annotate metadata with numeric strains by setting the dtype of the "strain" column in the sequence index to "string".
Related issue(s)
Fixes #948
Testing