Made Mutect2 isActive much better for low allele fractions #4832

davidbenjamin · 2018-05-31T16:20:21Z

@Takuto This is the change to improve high-depth sensitivity for mitochondria.

@meganshand @fleharty You are welcome to review as well if you'd like.

takutosato

Nice model!

takutosato · 2018-06-01T15:15:23Z

docs/mutect/mutect.tex

 \end{equation}
+where the approximation amounts to ignoring the possibility that ref reads are actually alt, or, equivalently, giving each ref read infinite quality.  This is not necessary but it greatly speeds the computation implementation because, as we will see, we will only need to keep alt base qualities in memory.


computation implementation -> computation (maybe?)

takutosato · 2018-06-01T15:15:40Z

docs/mutect/mutect.tex

 \end{equation}
-The terms that signify observed ref reads that are actually alt reads with an error and vice versa are negligible\footnote{We can set an upper bound on the error in the log likelihood by Taylor-expanding to first order.  The error turns out to be quite small.}  Then we get
+where we again assign infinite base quality to ref reads and have let $N_{\rm ref} = | \mathcal{R}|$.


have let -> let

takutosato · 2018-06-01T16:26:23Z

docs/mutect/mutect.tex

 \begin{align}
-{\rm LOD} \approx& \sum_{j \in \mathcal{A}} \left[ \log (1 - \epsilon_j) - \log \epsilon_j \right] + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!} \\
- \approx& -\sum_{j \in \mathcal{A}} \log \epsilon_j + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!},
+q(z_n) &\propto \left[ \epsilon_n \overline{\ln (1 - f)} \right]^{z_n} \left[ (1 - \epsilon_n) \overline{\ln f} \right]^{1 - z_n} \\


I think \overline{\ln (1 - f)} and \overline{\ln f} should be in the exponent, as in e^{\overline{\ln f}} and e^{\overline{\ln f}}.

You're right.

takutosato · 2018-06-01T16:36:14Z

docs/mutect/mutect.tex

+\end{equation}
+Then, Equation 10.3 of Bishop gives us the variational lower bound on $L(f)$:
+\begin{align}
+L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\


Entropy is the expectation of the negative log density, so maybe we should be adding entropies here?

i.e. I agree with what you have in the algorithm summary

takutosato · 2018-06-01T17:03:23Z

src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect2Engine.java

+        final double betaEntropy = Beta.logBeta(alpha, beta) - (alpha - 1)*digammaAlpha - (beta-1)*digammaBeta + (alpha + beta - 2)*digammaAlphaPlusBeta;
+
+        // TODO: check if the stream is too expensive
+        final double result = -betaEntropy + rho * refCount + altQuals.stream().mapToDouble(qual -> {


As I said above, I think we want to add entropies here

codecov-io · 2018-06-05T18:59:56Z

Codecov Report

Merging #4832 into master will increase coverage by 0.043%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##             master     #4832       +/-   ##
==============================================
+ Coverage     80.35%   80.394%   +0.043%     
- Complexity    17712     17730       +18     
==============================================
  Files          1088      1088               
  Lines         63975     63989       +14     
  Branches      10313     10313               
==============================================
+ Hits          51404     51443       +39     
+ Misses         8558      8543       -15     
+ Partials       4013      4003       -10

Impacted Files	Coverage Δ	Complexity Δ
...hellbender/tools/walkers/mutect/Mutect2Engine.java	`93.103% <100%> (+1.992%)`	`53 <9> (+3)`	⬆️
...rg/broadinstitute/hellbender/utils/io/IOUtils.java	`68.817% <0%> (-1.075%)`	`49% <0%> (-1%)`
...adinstitute/hellbender/utils/R/RScriptLibrary.java	`100% <0%> (ø)`	`6% <0%> (ø)`	⬇️
.../hellbender/utils/python/PythonScriptExecutor.java	`63.636% <0%> (ø)`	`10% <0%> (ø)`	⬇️
...lotypecaller/readthreading/ReadThreadingGraph.java	`88.608% <0%> (+0.253%)`	`144% <0%> (+1%)`	⬆️
...dinstitute/hellbender/utils/R/RScriptExecutor.java	`80.556% <0%> (+0.274%)`	`17% <0%> (ø)`	⬇️
...ellbender/tools/walkers/vqsr/CNNScoreVariants.java	`74.336% <0%> (+0.345%)`	`41% <0%> (ø)`	⬇️
...ols/walkers/haplotypecaller/AssemblyResultSet.java	`74.85% <0%> (+0.599%)`	`45% <0%> (+2%)`	⬆️
...nstitute/hellbender/utils/read/AlignmentUtils.java	`75.565% <0%> (+0.616%)`	`149% <0%> (+2%)`	⬆️
...tute/hellbender/engine/AssemblyRegionIterator.java	`87.342% <0%> (+1.266%)`	`24% <0%> (+1%)`	⬆️
... and 6 more

davidbenjamin · 2018-06-06T02:15:51Z

Back to you @takutosato. You caught a couple of whoppers. Fortunately, the validations still look good after fixing them.

takutosato

One minor comment but everything else looks great!

takutosato · 2018-06-07T13:48:43Z

docs/mutect/mutect.tex

-L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\
-&= -H(\alpha, \beta) + \rho N_{\rm ref} + \sum_n \left[ \gamma_n (\rho + \ln \epsilon_n) + (1 - \gamma_n)(\tau + \ln(1 - \epsilon_n) - H(\gamma_n) \right],
+L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] + {\rm entropy}[q(f)] + \sum_n {\rm entropy}[q(z_n)] \\
+&= -H(\alpha, \beta) +  N_{\rm ref} \ln \rho + \sum_n \left[ \gamma_n \ln \left( \rho \epsilon_n \right) + (1 - \gamma_n) \ln \left( \tau (1 - \epsilon_n) \right) - H(\gamma_n) \right],


entropies should have positive signs.

davidbenjamin assigned takutosato May 31, 2018

davidbenjamin requested a review from takutosato May 31, 2018 16:20

takutosato requested changes Jun 1, 2018

View reviewed changes

davidbenjamin force-pushed the db_m2_active branch 2 times, most recently from 0c82734 to 44d7165 Compare June 5, 2018 16:41

davidbenjamin force-pushed the db_m2_active branch from 83dcb75 to aeb961d Compare June 6, 2018 01:57

davidbenjamin added 2 commits June 5, 2018 22:05

Made Mutect2 isActive much better for low allele fractions

c661bc2

review edits

ffe2be2

davidbenjamin force-pushed the db_m2_active branch from aeb961d to ffe2be2 Compare June 6, 2018 02:05

takutosato approved these changes Jun 7, 2018

View reviewed changes

takutosato assigned davidbenjamin Jun 7, 2018

whoops

b1fbf4a

davidbenjamin merged commit 987f52c into master Jun 7, 2018

davidbenjamin deleted the db_m2_active branch June 7, 2018 14:10

davidbenjamin mentioned this pull request Jun 7, 2018

Mutect isActive loses sensitivity when allele fraction is comparable to base error rate #4816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made Mutect2 isActive much better for low allele fractions #4832

Made Mutect2 isActive much better for low allele fractions #4832

davidbenjamin commented May 31, 2018

takutosato left a comment

takutosato Jun 1, 2018

davidbenjamin Jun 5, 2018

takutosato Jun 1, 2018

davidbenjamin Jun 5, 2018

takutosato Jun 1, 2018

davidbenjamin Jun 5, 2018

takutosato Jun 1, 2018

takutosato Jun 1, 2018

davidbenjamin Jun 5, 2018

takutosato Jun 1, 2018

davidbenjamin Jun 5, 2018

codecov-io commented Jun 5, 2018 •

edited

Loading

davidbenjamin commented Jun 6, 2018

takutosato left a comment

takutosato Jun 7, 2018

davidbenjamin Jun 7, 2018

		\end{equation}
		where the approximation amounts to ignoring the possibility that ref reads are actually alt, or, equivalently, giving each ref read infinite quality. This is not necessary but it greatly speeds the computation implementation because, as we will see, we will only need to keep alt base qualities in memory.

Made Mutect2 isActive much better for low allele fractions #4832

Made Mutect2 isActive much better for low allele fractions #4832

Conversation

davidbenjamin commented May 31, 2018

takutosato left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jun 5, 2018 • edited Loading

Codecov Report

davidbenjamin commented Jun 6, 2018

takutosato left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jun 5, 2018 •

edited

Loading