Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask all ambiguous #682

Merged
merged 9 commits into from
Feb 19, 2021
Merged

Mask all ambiguous #682

merged 9 commits into from
Feb 19, 2021

Conversation

rneher
Copy link
Member

@rneher rneher commented Feb 16, 2021

Description of proposed changes

some workflows mark parts of the alignment, but standard ML ancestral inference will still pick a "most likely" among the equally likely states at these positions. This PR masks sites without information before exporting.

@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #682 (eb41026) into master (f256227) will decrease coverage by 0.06%.
The diff coverage is 10.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #682      +/-   ##
==========================================
- Coverage   30.34%   30.28%   -0.07%     
==========================================
  Files          40       40              
  Lines        5549     5567      +18     
  Branches     1349     1355       +6     
==========================================
+ Hits         1684     1686       +2     
- Misses       3805     3821      +16     
  Partials       60       60              
Impacted Files Coverage Δ
augur/ancestral.py 10.34% <10.00%> (-0.55%) ⬇️
augur/index.py 80.00% <0.00%> (-2.46%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f256227...eb41026. Read the comment docs.

@rneher
Copy link
Member Author

rneher commented Feb 16, 2021

This PR replaces sites in the root sequence that have N in every terminal sequence with N. Previously, these were an undetermined nucleotide depending on how rounding errors determined which site is "most likely".

The root sequence json now looks like this

"nuc": "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGCTGCATGCTTAGT....

The nt_muts.json file also contains a "mask":"111111100000000001000..." field that is currently unused.

how exactly we handle the mask is up for debate. But I do think we should mask completely undetermined sites.

@rneher
Copy link
Member Author

rneher commented Feb 16, 2021

Auspice then displays nothing...

image

@rneher rneher requested a review from huddlej February 16, 2021 22:06
@rneher
Copy link
Member Author

rneher commented Feb 16, 2021

CI fails bc I added a field to nt_json that is not expected by the tests. Can fix this if/after we decide whether we want to export that mask at all.

@huddlej
Copy link
Contributor

huddlej commented Feb 17, 2021

@rneher I didn't get a chance to fully test this today, but the logic of what you're describing here makes a lot of sense. I'll try to finish my review tomorrow morning and can also push a fix for the test (maybe just to make it less brittle instead of more specific).

Adds the new "mask" attribute to the ancestral sequences JSON for the
Zika build.
Adds a comment to describe what's happening and why in the new masking
feature. Also, makes minor stylistic changes for readability.
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked for me as expected with ncov and flu builds. I added a comment to the main masking logic for future us to understand what's happening there and why and updated the functional test to expect the new mask key in the ancestral output JSON. When the CI tests have finished, I'll merge this.

@huddlej huddlej merged commit 698df3d into master Feb 19, 2021
@huddlej huddlej deleted the mask-all-ambiguous branch February 19, 2021 00:28
@huddlej huddlej added this to the Next release 11.X.X milestone Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants