Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use supplementary alignments to identify large deletions #6092

Merged
merged 1 commit into from
Aug 23, 2019

Conversation

tedsharpe
Copy link
Contributor

No description provided.

Copy link
Contributor

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ted, I've finished reading the code but I have some questions about the decisions made in findLargeDeletions().
Can you please explain?

@SHuang-Broad
Copy link
Contributor

Hi Ted,
After we talked offline I went back and read the code again.
I still have some questions regarding the method
static Cigar findLargeDeletions( final GATKRead read )

  1. It seems this will do redundant work in the following sense: for the same query sequence which has several alignments, each GATKRead (which is actually an alignment record) will be touched by this method, hence imagine one query sequence with two corresponding GATKReads, the current implementation will emit two new GATKReads. Seems unnecessary and double counting evidence.
  2. I don't see any check on same-chromosome-ness, i.e. fields[0], which is formatted SA:Z:<CHR_NAME>.
  3. I see checking of overlap length on the query sequence between two alignment records, but I don't see a check of gap size on the query sequence between two alignment records. As we talked about this offline, it is a hard problem when the two alignment records are gapped away both on the reference and the query sequence, if you want to merge the two records into a single record.
  4. I am still not quite understanding line 1652 about why you are using the clip length to decide orders.
  5. Is there a reason to limit the del length to 2, which is the same as the threshold on overlapLength?

For the related method recplaceCigar, the method is not checking for negative values on overlapLength, which judged from int Interval.overlapLength(Interval) is possible.

@tedsharpe
Copy link
Contributor Author

  1. We're only processing primary lines. If there are supplementary alignments we grab those from the SA tag on the primary line.
  2. These experiments all have a single amplicon, to which the reads are aligned. All contig names are equal. That there is just one contig is tested on line 681.
  3. That is tested on line 1653 and 1661. If the reference coverage isn't properly gapped the deletionLength could be driven negative, and we won't attempt the merge.
  4. Because that's where the unaligned part of the read is, the could be "filled in" by a supplemental alignment.
  5. It could've been a little larger. We're trying to make sure that it isn't something that BWA would've been happy to merge into a single alignment.

Negative overlap lengths are possible, and accounted for. (A negative overlap is a gap.) Line 1644 takes Math.abs(overlapLen), so a gap larger than 2, or an overlap larger than 2 are both rejected.

I'll fix up the other helpful suggestions tomorrow, after my presentation. Thank you for those.

Copy link
Contributor Author

@tedsharpe tedsharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments have been addressed.

final List<CigarElement> elements = cigar.getCigarElements();
final int nElements = elements.size();
if ( nElements < 2 ) return cigar;
final int initialClipLength = getClipLength(elements.get(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. it bugs me a little because the methods are inefficient, but so be it.

Copy link
Contributor

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

 Date:      Tue Aug 13 16:20:03 2019 -0400
@tedsharpe tedsharpe merged commit 070f8e2 into master Aug 23, 2019
@tedsharpe tedsharpe deleted the tws_AnalyzeSatMut_largeDels branch August 23, 2019 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants