Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed filtering of normal hets on overlap with copy-ratio intervals… #4510

Merged
merged 1 commit into from
Mar 14, 2018

Conversation

samuelklee
Copy link
Contributor

@samuelklee samuelklee commented Mar 7, 2018

… in ModelSegments to be consistent with filtering of case hets.

Also some miscellaneous code cleanup.

@samuelklee
Copy link
Contributor Author

@sooheelee Are you comfortable reviewing this? The code changes are minimal.

Note that there is no "regression test" since the test files do not give rise to the scenario you encountered in https://github.com/broadinstitute/dsde-docs/issues/2891. Perhaps we can make sure that we include such a test case when we address #4007.

@codecov-io
Copy link

codecov-io commented Mar 7, 2018

Codecov Report

Merging #4510 into master will decrease coverage by 0.001%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##              master    #4510       +/-   ##
==============================================
- Coverage     79.121%   79.12%   -0.001%     
- Complexity     16677    16682        +5     
==============================================
  Files           1051     1051               
  Lines          60285    60293        +8     
  Branches        9875     9876        +1     
==============================================
+ Hits           47698    47704        +6     
- Misses          8759     8761        +2     
  Partials        3828     3828
Impacted Files Coverage Δ Complexity Δ
.../segmentation/MultidimensionalKernelSegmenter.java 87.952% <100%> (ø) 13 <0> (ø) ⬇️
...er/segmentation/AlleleFractionKernelSegmenter.java 88.462% <100%> (ø) 8 <0> (ø) ⬇️
...rmats/collections/AbstractLocatableCollection.java 100% <100%> (ø) 10 <3> (+2) ⬆️
...ynumber/segmentation/CopyRatioKernelSegmenter.java 76.923% <100%> (ø) 8 <0> (ø) ⬇️
...ute/hellbender/tools/copynumber/ModelSegments.java 99.465% <100%> (+0.018%) 44 <3> (+3) ⬆️
...e/hellbender/engine/spark/SparkContextFactory.java 71.233% <0%> (-2.74%) 11% <0%> (ø)

@sooheelee
Copy link
Contributor

I'm only in an introductory Java class, geared towards Android app development. So I cannot comment on the code. What I have done is take the branch and test it against the data that I have and I can say the counts now match up to the lower expected value. Furthermore, the four questionable sites are now absent.

Before:
screenshot 2018-03-08 11 05 38

With changes:
screenshot 2018-03-08 11 06 06

Let me know if and how I can help with your efforts in creating a test if you decide to test for such a scenario.

Copy link
Contributor

@sooheelee sooheelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the changes against a dataset that I know displays the discrepancy and now the discrepant sites are absent from the hets.tsv such that T-N.hets.normal.tsv and N.hets.tsv match in sites exactly.

@samuelklee
Copy link
Contributor Author

OK, thanks! @MartonKN, can you take a quick look at the code?

… in ModelSegments to be consistent with filtering of case hets.
Copy link
Contributor

@MartonKN MartonKN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with my code review, everything looks good.

hetAllelicCounts = new AllelicCountCollection(
metadata,
filteredAllelicCounts.getRecords().stream()
.filter(ac -> hetNormalAllelicCountSites.contains(ac.getInterval()))
.filter(ac -> hetNormalAllelicCounts.getOverlapDetector().overlapsAny(ac))
.collect(Collectors.toList()));
final File hetAllelicCountsFile = new File(outputDir, outputPrefix + HET_ALLELIC_COUNTS_FILE_SUFFIX);
hetAllelicCounts.write(hetAllelicCountsFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :)

public OverlapDetector<RECORD> getOverlapDetector() {
return OverlapDetector.create(getRecords());
return overlapDetector.get();
}

public Comparator<Locatable> getComparator() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -67,7 +67,7 @@ public AlleleFractionSegmentCollection findSegmentation(final int maxNumChangepo
"Log-linear factor for the penalty on the number of changepoints per chromosome must be non-negative.");

logger.info(String.format("Finding changepoints in %d data points and %d chromosomes...",
allelicCounts.getRecords().size(), allelicCountsPerChromosome.size()));
allelicCounts.size(), allelicCountsPerChromosome.size()));

//loop over chromosomes, find changepoints, and create allele-fraction segments
final List<AlleleFractionSegment> segments = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -70,7 +70,7 @@ public CopyRatioSegmentCollection findSegmentation(final int maxNumChangepointsP
"Log-linear factor for the penalty on the number of changepoints per chromosome must be non-negative.");

logger.info(String.format("Finding changepoints in %d data points and %d chromosomes...",
denoisedCopyRatios.getRecords().size(), denoisedCopyRatiosPerChromosome.size()));
denoisedCopyRatios.size(), denoisedCopyRatiosPerChromosome.size()));

//loop over chromosomes, find changepoints, and create copy-ratio segments
final List<CopyRatioSegment> segments = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -143,7 +143,7 @@ public MultidimensionalSegmentCollection findSegmentation(final int maxNumChange
kernelVarianceCopyRatio, kernelVarianceAlleleFraction, kernelScalingAlleleFraction);

logger.info(String.format("Finding changepoints in (%d, %d) data points and %d chromosomes...",
denoisedCopyRatios.getRecords().size(), allelicCounts.size(), multidimensionalPointsPerChromosome.size()));
denoisedCopyRatios.size(), allelicCounts.size(), multidimensionalPointsPerChromosome.size()));

//loop over chromosomes, find changepoints, and create allele-fraction segments
final List<MultidimensionalSegment> segments = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

EXPECTED_METADATA.getSequenceDictionary(), hetNormalAllelicCounts.getMetadata().getSequenceDictionary()));
}
if (isAllelicCountsPresent && isNormalAllelicCountsPresent) {
Assert.assertEquals(hetAllelicCounts.getIntervals(), hetNormalAllelicCounts.getIntervals());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@samuelklee samuelklee merged commit dac6311 into master Mar 14, 2018
@samuelklee samuelklee deleted the sl_filter_normal_hets branch March 14, 2018 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants