Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SV) more alignment edge cases #5044

Merged
merged 1 commit into from
Jul 23, 2018
Merged

Conversation

SHuang-Broad
Copy link
Contributor

fixes #4951

And another alignment edge case illustrated below (SAM records)

asm017280:tig00000	0	chr18	77371711	60	1447M50D184M980S	*	0	0	AATCAGATTTTCAGAACAGAGCTTCTTAGGGAGATGGTTGGACCACCATAGTCCTGCATCTTCAGGGAGCTGCTCACCATCCCCACTACTGCAGGCACCACTGATCCCTGAGTGAGACCCAGGCACTAACAAGACATGAAGAAGGTGCTGGGTTCCTGGCCGGTCCTTTTGACCCATAAGTGTGGTTCAGTTCTTGGCACTGATGGGCTGGTCGGATGGTGGTGAGGTCAAGCACTCCTTCAGTGCCTGAGGCGTGGCGAGTGGACTTTGGGACCTCTTCCTGAGATGATGATTGCTTCTCAAGGGTGGGAATGAGTTCTTCCAGGGGTCAGCGGTGTGTCATAGGCAAGCACAGCACATTACTAGAAAGCATACGTTTTATTAAGTAGAAGGTATCTTTAATTCCCTGAATAGTAAGACATCTTTGGATTGATGGAAAGGTTGCCCTGGGAAAGGTGGAATATTTGTTGTCATGACCTAAAGCAGCATGCCTAACACCATGTTTGCCAGAGCTCATGCCCACACTGCCATCTCAGGGTTTTCTAAGGGAGAGGAAGCAAGAGGAGGCCCTCGAATTTCTAACCGATTGCTGCAGTCAAAAAATTGCTAAGAAGGACTCGGTGGAAAAGGGAAAACTCAAAAGTGAAGTATGCTGTGCAGAAGCCAGAGCTCCACCGGCTCCCAGGATTACAGTAACAGTCAGGCTCCTGCTGTCTGGGCCCAATAACGCCATGCGCAGCATCCACATCAGCTGTCCTGAGAACCAGCAAGGCGTCCAAGGCTTAGTGTGTGTTAGAGGCCAATACTGTCCCTGCTGTTCATTCTCAGGCTAAACCCACGATCTTAGTAAGCACAGATGCAGAGACACAGTTTGTGGTATTGCCGTCATTTTCTTGGGAATGTATCAACAGAGTATACAAGGGTAACAGTTCTTTCTATAAAAAAAATCAACATTTATTTTTATTCATTAATTTACCATGAAAATGATTTTTTCCCAGTAGAGAAGTCAGTCTGGGAAAACCAGATAAAAACTAGAAATGATCTTGTTCGCTGTTGTGGAGGCGGTGCAGGAGGCCCGGGAGCTGGATCACTTGACTTTGTTTTCTCTCCTCCTCTGCTTGCAGAACATCTCTGCTACAGTGGAGGACCCACACTCATAGTTCAGCTCCCCACCTCACACCCCAGCCAAGCCCAGGACCCACACTCATGGCCCCGCTCCCCACCTCACACCCCAGCCAAGCCCAGGACCCACGCTCATGGCCCCGCTCTCCACCTCATACCCCAACCAAGTCTAGGACCCACGCTCATGGCCCTGCTCCCCACCTCACACCCCGACCAAGTCCAGGACCCACGCTCAGGACCCCTCTCCCCACCTCACACCTCAGCCACGCCCAGGACCCATGCTCACAGTTCCGCTCCCCACCTCACACCCCAACCAAGTCCAGGACCCATGCTCATGGCCCTGCTCCCCACCTCACACCCCGACCAAGTCCAGGACCCATGCTCATGGCCCCACTCTCCACCTCATACCCCAGCCAAGCCCAGGACCCACACTCACGGTTCCGCTCCCCACCTCACACCCCGACCAAGCCCAGGACCCACACTCAGGACCCCTCTCCCCACCTCACACCGCAGTTGCAGCATCTGAGGGGAGGGGCAGGACAAGGTGGCAGAGGTGGGGCATTTTTACCCAGAGAAGACCAAAAACAGTCAAAGGGATGGTTTAAATACCTGTCTTTTCTATTTTTTCACTGACTCCATGGTAACAGATTTTGCAGTAAGAACCAACATACACACATCCCTAAAACATATTTCATCAAATAATGCTGACATTATGTGTAGCACATGCTGATATTTTCTATTCTATTCTCATTTTCTCTCTCTTTTTTTCAAACAAATGATAATGAAAATCCACTAAGCTAAGGGTCATGACCAACTAACGGGTAACACCCTTCAGTTTGGAAATACTGATTTAAATTTTGGCTTGCTCTAAGCTATAATTCACATTTGACATTTAGATCATATTTTTCCCCTACTGACTCTCAAGTGGGAGCTGAGAGGCAGTACCTAATAAATACAGAGAAAAAGAAAGAAGGAAGGAAAATTGATATTTACAACACAACCAAGTTTTAATGTGAGTTAAAAAAACGGGTTTCTTTTTGATGGAAATCTTCTAAAATTGTGCTGATGGTTACACACTTCTGTGAATGCACCAAAAACCATTGAATTGTACATTTTAAATGAGGTCATTGTTGCTCTGTGAGTTATATGTCAACAAAGCCGTGAAAATCAGCAAACGAGAACAAACAGTTCTGGCCTAGGCCCTGCGAGGCCTCTTTCTAAGGCCGCCGCACGCCCATCTGGGCCATTAAACTCCTCAAGGCGCTCTGACTTCCTTGTGTAAGAGATATTGCTGTTTCTTATAAGGCACTGATGTACTTTTGCTTCCTTATTGGGGTAAAATTTAAAGACCATAACACTGATGGTTTTAAGAGTGCAGTTAAAAAAAGTTTTGACAAATGTATAAACACATGTAACAACCGCCCCAATCAATACACAGCACATTTCCACGGCCCCAGCAGGTTTCCTTGGTCCCCTTCCCAGCAAATCC	*	SA:Z:chr18,77373176,+,1165S338M50I1058M,60,93;	MD:Z:188G345G84A55T34A16T324A1G0A9C186A15C8C6G5C1C9A24T5A8T14T1G4G16C6A12C0A4T0G1C0C1T19G13^CCCACGCTCAGGACCCCTCTCCCCACCTCACACCTCAGCCACGCCCAGGG10C2T0T1C19A147	RG:Z:GATKSVContigAlignments	NM:i:88	AS:i:1375	XS:i:262
asm017280:tig00000	2048	chr18	77373176	60	1165H338M50I1058M	*	0	0	CTCCCCACCTCACACCCCAGCCAAGCCCAGGACCCACACTCATGGCCCCGCTCCCCACCTCACACCCCAGCCAAGCCCAGGACCCACGCTCATGGCCCCGCTCTCCACCTCATACCCCAACCAAGTCTAGGACCCACGCTCATGGCCCTGCTCCCCACCTCACACCCCGACCAAGTCCAGGACCCACGCTCAGGACCCCTCTCCCCACCTCACACCTCAGCCACGCCCAGGACCCATGCTCACAGTTCCGCTCCCCACCTCACACCCCAACCAAGTCCAGGACCCATGCTCATGGCCCTGCTCCCCACCTCACACCCCGACCAAGTCCAGGACCCATGCTCATGGCCCCACTCTCCACCTCATACCCCAGCCAAGCCCAGGACCCACACTCACGGTTCCGCTCCCCACCTCACACCCCGACCAAGCCCAGGACCCACACTCAGGACCCCTCTCCCCACCTCACACCGCAGTTGCAGCATCTGAGGGGAGGGGCAGGACAAGGTGGCAGAGGTGGGGCATTTTTACCCAGAGAAGACCAAAAACAGTCAAAGGGATGGTTTAAATACCTGTCTTTTCTATTTTTTCACTGACTCCATGGTAACAGATTTTGCAGTAAGAACCAACATACACACATCCCTAAAACATATTTCATCAAATAATGCTGACATTATGTGTAGCACATGCTGATATTTTCTATTCTATTCTCATTTTCTCTCTCTTTTTTTCAAACAAATGATAATGAAAATCCACTAAGCTAAGGGTCATGACCAACTAACGGGTAACACCCTTCAGTTTGGAAATACTGATTTAAATTTTGGCTTGCTCTAAGCTATAATTCACATTTGACATTTAGATCATATTTTTCCCCTACTGACTCTCAAGTGGGAGCTGAGAGGCAGTACCTAATAAATACAGAGAAAAAGAAAGAAGGAAGGAAAATTGATATTTACAACACAACCAAGTTTTAATGTGAGTTAAAAAAACGGGTTTCTTTTTGATGGAAATCTTCTAAAATTGTGCTGATGGTTACACACTTCTGTGAATGCACCAAAAACCATTGAATTGTACATTTTAAATGAGGTCATTGTTGCTCTGTGAGTTATATGTCAACAAAGCCGTGAAAATCAGCAAACGAGAACAAACAGTTCTGGCCTAGGCCCTGCGAGGCCTCTTTCTAAGGCCGCCGCACGCCCATCTGGGCCATTAAACTCCTCAAGGCGCTCTGACTTCCTTGTGTAAGAGATATTGCTGTTTCTTATAAGGCACTGATGTACTTTTGCTTCCTTATTGGGGTAAAATTTAAAGACCATAACACTGATGGTTTTAAGAGTGCAGTTAAAAAAAGTTTTGACAAATGTATAAACACATGTAACAACCGCCCCAATCAATACACAGCACATTTCCACGGCCCCAGCAGGTTTCCTTGGTCCCCTTCCCAGCAAATCC	*	SA:Z:chr18,77371711,+,1447M50D184M980S,60,88;	MD:Z:16T6C7G4T0G4C2T0T22A5T10T12A19G5C1C9A4C2T0T1C26C11A28C6A12C0A5G18A5G6C10C5C6A70G4T10T0G4T1G4G3T200G738G52	RG:Z:GATKSVContigAlignments	NM:i:93	AS:i:1115	XS:i:0

@codecov-io
Copy link

codecov-io commented Jul 22, 2018

Codecov Report

Merging #5044 into master will decrease coverage by 6.011%.
The diff coverage is 88.496%.

@@               Coverage Diff               @@
##              master     #5044       +/-   ##
===============================================
- Coverage     86.382%   80.371%   -6.011%     
+ Complexity     28674     27677      -997     
===============================================
  Files           1785      1785               
  Lines         132713    133530      +817     
  Branches       14766     15004      +238     
===============================================
- Hits          114640    107319     -7321     
- Misses         12746     21030     +8284     
+ Partials        5327      5181      -146
Impacted Files Coverage Δ Complexity Δ
...e/ContigChimericAlignmentIterativeInterpreter.java 80.645% <0%> (-1.772%) 31 <0> (ø)
...te/hellbender/tools/spark/sv/utils/SVInterval.java 82.979% <0%> (-1.804%) 32 <3> (+3)
...ender/tools/spark/sv/utils/SVIntervalUnitTest.java 95.652% <100%> (+12.319%) 4 <2> (+2) ⬆️
...lignment/AssemblyContigAlignmentsConfigPicker.java 91.563% <71.429%> (-1.953%) 108 <13> (+13)
.../AssemblyContigAlignmentsConfigPickerUnitTest.java 99.057% <96.923%> (-0.548%) 30 <10> (+10)
...read/markduplicates/MarkDuplicatesSparkTester.java 0% <0%> (-100%) 0% <0%> (-5%)
...tils/read/markduplicates/MarkDuplicatesTester.java 0% <0%> (-100%) 0% <0%> (-2%)
...rs/variantutils/SelectVariantsIntegrationTest.java 0.275% <0%> (-99.725%) 1% <0%> (-65%)
.../AbstractMarkDuplicatesCommandLineProgramTest.java 0.443% <0%> (-99.335%) 2% <0%> (-74%)
...kers/filters/VariantFiltrationIntegrationTest.java 0.826% <0%> (-99.174%) 1% <0%> (-25%)
... and 140 more

Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of refactoring suggestions and one implementation question but otherwise this looks ok. Make sure you run it on the sample that was failing to test it out.

@@ -571,13 +572,30 @@ private static double computeTigExplainQualOfOneConfiguration(final List<Alignme
return bestConfigurations.stream()
.map(mappings -> splitGaps(mappings, false))
.map(mappings -> removeNonUniqueMappings(mappings, ALIGNMENT_MQ_THRESHOLD, ALIGNMENT_LOW_READ_UNIQUENESS_THRESHOLD))
.filter(mappings -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be cleaner like this?

.filter(mappings -> mappings.goodMappings.size != 2 || ! simpleChimeraWithStichableAlignments(mappings.goodMappings.get(0), mappings.goodMappings.get(1))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.map(mappings -> createContigGivenClassifiedAlignments(contigName, contigSeq, mappings, true))
.sorted(getConfigurationComparator())
.iterator();
} else {
final GoodAndBadMappings intermediate = splitGaps(bestConfigurations.get(0), false);
final GoodAndBadMappings result = removeNonUniqueMappings(intermediate, ALIGNMENT_MQ_THRESHOLD, ALIGNMENT_LOW_READ_UNIQUENESS_THRESHOLD);
return Collections.singletonList(createContigGivenClassifiedAlignments(contigName, contigSeq, result, false)).iterator();
if (result.goodMappings.size() != 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do the refactoring suggested above you could make a little method for the if clause and only write it once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -853,11 +863,14 @@ private static void removeDueToShortReadSpan(final List<AlignmentInterval> selec
}
}

@VisibleForTesting
static final int TEMPORARY_SELF_POINTING_SV_INTERVAL_CONTIG = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. It doesn't seem like you ever refer to that except when putting things into the max overlap map below. Why create these invalid intervals? Also, I think that our contig numbering scheme for SVInterval starts at zero so these fake intervals will look like real intervals. I think I'd much rather keep them the old way in Tuple2<Integer, Integer>s than jam them into invalid interval objects.

* See the funny alignment signature described in ticket 4951 on GATK github
* @param intervalOne assumed to start no later than {@code intervalTwo} on the read
* @param intervalTwo assumed to start no earlier than {@code intervalOne} on the read
* @return if the two given intervals can be stitched together
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"true if"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SHuang-Broad
Copy link
Contributor Author

All 4 trios pass without problems, by taking in the assembly bams from Ted's branch.

Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for making those changes.

👍

* ticket 4951
* two neighboring gapped alignments, after gap split, generate unsorted alignments
* also a new utility method in SVInterval
@SHuang-Broad SHuang-Broad force-pushed the sh-sv-more-alignment-edge-cases branch from 1266952 to a4b63ce Compare July 23, 2018 16:42
@SHuang-Broad
Copy link
Contributor Author

Thanks for the quick review!

@SHuang-Broad SHuang-Broad merged commit 9d9484f into master Jul 23, 2018
@SHuang-Broad SHuang-Broad deleted the sh-sv-more-alignment-edge-cases branch July 23, 2018 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Funny (but possible) alignments may fool SV type inference code
3 participants