Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote M2 isActive method as special case of somatic likelihoods model #5814

Merged
merged 3 commits into from
Mar 25, 2019

Conversation

davidbenjamin
Copy link
Contributor

@davidbenjamin davidbenjamin commented Mar 19, 2019

Closes #5775.

@takutosato This doesn't affect M2 results (well, actually it improves sensitivity by 0.01%) but it reduces runtime by about 5% and makes the docs and code cleaner.

@jamesemery could you verify that in abstracting out Log10Cache as IntToDoubleFunctionCache I didn't spoil its thread safety?

@codecov-io
Copy link

codecov-io commented Mar 19, 2019

Codecov Report

Merging #5814 into master will decrease coverage by 6.724%.
The diff coverage is 93.22%.

@@               Coverage Diff               @@
##              master     #5814       +/-   ##
===============================================
- Coverage     87.029%   80.305%   -6.724%     
+ Complexity     32104     30542     -1562     
===============================================
  Files           1972      1978        +6     
  Lines         147187    147475      +288     
  Branches       16201     16230       +29     
===============================================
- Hits          128096    118430     -9666     
- Misses         13184     23330    +10146     
+ Partials        5907      5715      -192
Impacted Files Coverage Δ Complexity Δ
...kers/haplotypecaller/AssemblyBasedCallerUtils.java 88.961% <ø> (ø) 115 <0> (ø) ⬇️
...dinstitute/hellbender/utils/MathUtilsUnitTest.java 92.516% <100%> (+0.098%) 152 <4> (+4) ⬆️
...er/tools/walkers/mutect/Mutect2EngineUnitTest.java 100% <100%> (ø) 5 <0> (ø) ⬇️
...nstitute/hellbender/utils/Log10FactorialCache.java 100% <100%> (ø) 3 <3> (?)
...org/broadinstitute/hellbender/utils/MathUtils.java 80.503% <100%> (+0.623%) 222 <4> (+1) ⬆️
...rg/broadinstitute/hellbender/utils/Log10Cache.java 100% <100%> (ø) 3 <3> (?)
.../broadinstitute/hellbender/utils/DigammaCache.java 100% <100%> (ø) 3 <3> (?)
...hellbender/tools/walkers/mutect/Mutect2Engine.java 89.545% <100%> (-0.322%) 90 <2> (ø)
...lkers/genotyper/GenotypeLikelihoodCalculators.java 89.024% <100%> (+0.274%) 25 <0> (+1) ⬆️
...ute/hellbender/utils/IntToDoubleFunctionCache.java 80% <80%> (ø) 4 <4> (?)
... and 188 more

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbenjamin Sorry for the delay, all looks well to me!


Under the first assumption the contribution of ref bases to the sum in Equation \ref{tumor-lod} vanishes -- errorless ref bases contribute no entropy -- so we only need to sum over alt bases. By the second assumption the likelihood of an alt read is related to its base quality (and associated error rate $\epsilon$) as $\ell_{r,{\rm alt}} = 1 - \epsilon_r$.

We compute Equation \ref{tumor-lod} for two possibilities; first, that an alt allele exists, second that only the ref allele exists. In the former case, Initializing $\bar{z}_{r,{\rm ref(alt)}} = 1$ for ref (alt) bases a single iteration for $q(\vf)$ gives $\vbeta = (N_{\rm alt} + 1, N_{\rm ref} + 1)$. For alt reads $r$ we then obtain from Equation \ref{f-tilde}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we obtain this equation from the equation about z_bar (Equation 6) right above \ref{f-tilde}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments. Upon looking at the code you are refactoring and reflecting I would say this is certainly no more dangerous thread safety-wise than it was before as it appears to be a mostly equivalent implementation. The safety is predicated on the fact that the resize operation updates the array in its last step and that it doesn't change any of the underlying cache values, rather opting to copy them instead. I haven't looked closely at the code changes in this branch however.

/**
* Wrapper class so that the log10Factorial array is only calculated if it's used
*/
public class Log10FactorialCache extends IntToDoubleFunctionCache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Override
protected double compute(final int n) {
return MathUtils.log10Gamma(n + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without delving into the details of what exactly your branch is computing as of right now, it looks like this isn't actually doing the factorial computation here but rather calling out to the logGamma function. This may be equivalent? If not then you should update the comments on this class to reflect what its doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the gamma function satisfied Gamma(n + 1) = n! for integers n. The old implementation instead grew the cache incrementally as log((n+1)!) = log(n!) + log(n+1). Since log(n+1) in the old implementation was also cached this was faster, but the one-time cost of computing logGamma a few thousand times to fill the cache is probably a few milliseconds, if that.

public IntToDoubleFunctionCache() { }

/**
* Get the value of log10(n), expanding the cache as necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this javadoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @return log10(n)
*/
public double get(final int i) {
Utils.validateArg(i >= 0, () -> String.format("Cache doesn't apply to negative number %d", i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe document this restriction (or even change the class name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify M2 likelihoods
5 participants