Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CIGAR size calculation for insertions #111

Merged
merged 1 commit into from
Jun 14, 2019
Merged

Conversation

unode
Copy link
Member

@unode unode commented Jun 7, 2019

This fixes #109 but has potential implications in several code paths (calculation of % identity, which sequence is kept, ...) so requesting a review and further scrutiny before merge.

Tests will likely fail due to #110 .

@unode
Copy link
Member Author

unode commented Jun 10, 2019

A small update:

On a local analysis 90 samples were failing due to CIGAR errors, the patch above fixed the problem for all of them without noticeable effects to other samples.

@luispedro
Copy link
Member

This looks all good to me.

I am not sure about the backwards compatibility implications, though, and whether it needs to be have a check for versions.

@unode
Copy link
Member Author

unode commented Jun 11, 2019

I am not sure about the backwards compatibility implications, though, and whether it needs to be have a check for versions.

I can't really assess backwards compatibility either. I'm still surprised we never ran into this before. Wondering if there's a samtools strictness change in the latest version that is now causing these to fail.

I agree that versioning this change makes sense. Perhaps with a warning much like the one in select() for 0.8. I'll look into it.

@unode
Copy link
Member Author

unode commented Jun 12, 2019

Given this function isn't part of NGLessIO and is used in quite a few locations, most of them pure, what's your recommendation for implementing the warning and backwards compatibility behavior?

@luispedro luispedro merged commit 0e5e9e8 into master Jun 14, 2019
@luispedro
Copy link
Member

I looked at this again: the reinjection can be a hard upgrade because the previous version was wrong. The tricky part is when this is used for computing match identities and sizes. I just threaded a boolean through the code. It's an inelegant solution, but it gets the job done.

@unode
Copy link
Member Author

unode commented Jun 14, 2019

Cool, thanks for the fix!

@unode unode deleted the cigar_insert_size branch June 14, 2019 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reinjected sequences on matches containing inserts are invalid according to samtools
2 participants