Rewrote M2 isActive method as special case of somatic likelihoods model #5814

davidbenjamin · 2019-03-19T16:11:32Z

Closes #5775.

@takutosato This doesn't affect M2 results (well, actually it improves sensitivity by 0.01%) but it reduces runtime by about 5% and makes the docs and code cleaner.

@jamesemery could you verify that in abstracting out Log10Cache as IntToDoubleFunctionCache I didn't spoil its thread safety?

codecov-io · 2019-03-19T17:02:38Z

Codecov Report

Merging #5814 into master will decrease coverage by 6.724%.
The diff coverage is 93.22%.

@@               Coverage Diff               @@
##              master     #5814       +/-   ##
===============================================
- Coverage     87.029%   80.305%   -6.724%     
+ Complexity     32104     30542     -1562     
===============================================
  Files           1972      1978        +6     
  Lines         147187    147475      +288     
  Branches       16201     16230       +29     
===============================================
- Hits          128096    118430     -9666     
- Misses         13184     23330    +10146     
+ Partials        5907      5715      -192

Impacted Files	Coverage Δ	Complexity Δ
...kers/haplotypecaller/AssemblyBasedCallerUtils.java	`88.961% <ø> (ø)`	`115 <0> (ø)`	⬇️
...dinstitute/hellbender/utils/MathUtilsUnitTest.java	`92.516% <100%> (+0.098%)`	`152 <4> (+4)`	⬆️
...er/tools/walkers/mutect/Mutect2EngineUnitTest.java	`100% <100%> (ø)`	`5 <0> (ø)`	⬇️
...nstitute/hellbender/utils/Log10FactorialCache.java	`100% <100%> (ø)`	`3 <3> (?)`
...org/broadinstitute/hellbender/utils/MathUtils.java	`80.503% <100%> (+0.623%)`	`222 <4> (+1)`	⬆️
...rg/broadinstitute/hellbender/utils/Log10Cache.java	`100% <100%> (ø)`	`3 <3> (?)`
.../broadinstitute/hellbender/utils/DigammaCache.java	`100% <100%> (ø)`	`3 <3> (?)`
...hellbender/tools/walkers/mutect/Mutect2Engine.java	`89.545% <100%> (-0.322%)`	`90 <2> (ø)`
...lkers/genotyper/GenotypeLikelihoodCalculators.java	`89.024% <100%> (+0.274%)`	`25 <0> (+1)`	⬆️
...ute/hellbender/utils/IntToDoubleFunctionCache.java	`80% <80%> (ø)`	`4 <4> (?)`
... and 188 more

takutosato

@davidbenjamin Sorry for the delay, all looks well to me!

takutosato · 2019-03-25T01:06:27Z

docs/mutect/mutect.tex

+
+Under the first assumption the contribution of ref bases to the sum in Equation \ref{tumor-lod} vanishes -- errorless ref bases contribute no entropy -- so we only need to sum over alt bases.  By the second assumption the likelihood of an alt read is related to its base quality (and associated error rate $\epsilon$) as $\ell_{r,{\rm alt}} = 1 - \epsilon_r$.
+
+We compute Equation \ref{tumor-lod} for two possibilities; first, that an alt allele exists, second that only the ref allele exists.  In the former case, Initializing $\bar{z}_{r,{\rm ref(alt)}} = 1$ for ref (alt) bases a single iteration for $q(\vf)$ gives $\vbeta = (N_{\rm alt} + 1, N_{\rm ref} + 1)$.  For alt reads $r$ we then obtain from Equation \ref{f-tilde}


Don't we obtain this equation from the equation about z_bar (Equation 6) right above \ref{f-tilde}?

jamesemery

Two comments. Upon looking at the code you are refactoring and reflecting I would say this is certainly no more dangerous thread safety-wise than it was before as it appears to be a mostly equivalent implementation. The safety is predicated on the fact that the resize operation updates the array in its last step and that it doesn't change any of the underlying cache values, rather opting to copy them instead. I haven't looked closely at the code changes in this branch however.

jamesemery · 2019-03-25T16:52:15Z

src/main/java/org/broadinstitute/hellbender/utils/Log10FactorialCache.java

+/**
+ * Wrapper class so that the log10Factorial array is only calculated if it's used
+ */
+public class Log10FactorialCache extends IntToDoubleFunctionCache {


Make this final

jamesemery · 2019-03-25T18:25:31Z

src/main/java/org/broadinstitute/hellbender/utils/Log10FactorialCache.java

+
+    @Override
+    protected double compute(final int n) {
+        return MathUtils.log10Gamma(n + 1);


Without delving into the details of what exactly your branch is computing as of right now, it looks like this isn't actually doing the factorial computation here but rather calling out to the logGamma function. This may be equivalent? If not then you should update the comments on this class to reflect what its doing.

Yes, the gamma function satisfied Gamma(n + 1) = n! for integers n. The old implementation instead grew the cache incrementally as log((n+1)!) = log(n!) + log(n+1). Since log(n+1) in the old implementation was also cached this was faster, but the one-time cost of computing logGamma a few thousand times to fill the cache is probably a few milliseconds, if that.

samuelklee · 2019-03-25T19:51:23Z

src/main/java/org/broadinstitute/hellbender/utils/IntToDoubleFunctionCache.java

+    public IntToDoubleFunctionCache() { }
+
+    /**
+     * Get the value of log10(n), expanding the cache as necessary


update this javadoc?

samuelklee · 2019-03-25T19:52:01Z

src/main/java/org/broadinstitute/hellbender/utils/IntToDoubleFunctionCache.java

+     * @return log10(n)
+     */
+    public double get(final int i) {
+        Utils.validateArg(i >= 0, () -> String.format("Cache doesn't apply to negative number %d", i));


maybe document this restriction (or even change the class name)

Rewrote M2 isActive method as special case of somatic likelihoods model

05ab4eb

davidbenjamin added coding DeveloperDoc Mutect labels Mar 19, 2019

davidbenjamin added this to the Mutect 3 milestone Mar 19, 2019

davidbenjamin assigned takutosato Mar 19, 2019

davidbenjamin requested review from takutosato and jamesemery March 19, 2019 16:11

davidbenjamin assigned jamesemery Mar 24, 2019

takutosato approved these changes Mar 25, 2019

View reviewed changes

jamesemery reviewed Mar 25, 2019

View reviewed changes

samuelklee reviewed Mar 25, 2019

View reviewed changes

davidbenjamin added 2 commits March 25, 2019 16:00

review edits

9068cdf

more review

0f6d540

davidbenjamin merged commit 9902cd5 into master Mar 25, 2019

davidbenjamin deleted the db_unify branch March 25, 2019 23:55

davidbenjamin added the Documentation label Feb 12, 2020

This was referenced Jan 27, 2022

Be aware of numerical stability issues introduced by changes to MathUtils methods, e.g. log10factorial. #7649

Closed

Added numerical-stability tests and updated test data for all ModelSegments single-sample and multiple-sample modes. #7652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrote M2 isActive method as special case of somatic likelihoods model #5814

Rewrote M2 isActive method as special case of somatic likelihoods model #5814

davidbenjamin commented Mar 19, 2019 •

edited

Loading

codecov-io commented Mar 19, 2019 •

edited

Loading

takutosato left a comment

takutosato Mar 25, 2019

davidbenjamin Mar 25, 2019

jamesemery left a comment

jamesemery Mar 25, 2019

davidbenjamin Mar 25, 2019

jamesemery Mar 25, 2019

davidbenjamin Mar 25, 2019

samuelklee Mar 25, 2019

davidbenjamin Mar 25, 2019

samuelklee Mar 25, 2019

davidbenjamin Mar 25, 2019


		Under the first assumption the contribution of ref bases to the sum in Equation \ref{tumor-lod} vanishes -- errorless ref bases contribute no entropy -- so we only need to sum over alt bases. By the second assumption the likelihood of an alt read is related to its base quality (and associated error rate $\epsilon$) as $\ell_{r,{\rm alt}} = 1 - \epsilon_r$.

		We compute Equation \ref{tumor-lod} for two possibilities; first, that an alt allele exists, second that only the ref allele exists. In the former case, Initializing $\bar{z}_{r,{\rm ref(alt)}} = 1$ for ref (alt) bases a single iteration for $q(\vf)$ gives $\vbeta = (N_{\rm alt} + 1, N_{\rm ref} + 1)$. For alt reads $r$ we then obtain from Equation \ref{f-tilde}

Rewrote M2 isActive method as special case of somatic likelihoods model #5814

Rewrote M2 isActive method as special case of somatic likelihoods model #5814

Conversation

davidbenjamin commented Mar 19, 2019 • edited Loading

codecov-io commented Mar 19, 2019 • edited Loading

Codecov Report

takutosato left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesemery left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidbenjamin commented Mar 19, 2019 •

edited

Loading

codecov-io commented Mar 19, 2019 •

edited

Loading