Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made Mutect2 isActive much better for low allele fractions #4832

Merged
merged 3 commits into from
Jun 7, 2018

Conversation

davidbenjamin
Copy link
Contributor

@Takuto This is the change to improve high-depth sensitivity for mitochondria.

@meganshand @fleharty You are welcome to review as well if you'd like.

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice model!

\end{equation}
where the approximation amounts to ignoring the possibility that ref reads are actually alt, or, equivalently, giving each ref read infinite quality. This is not necessary but it greatly speeds the computation implementation because, as we will see, we will only need to keep alt base qualities in memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computation implementation -> computation (maybe?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

\end{equation}
The terms that signify observed ref reads that are actually alt reads with an error and vice versa are negligible\footnote{We can set an upper bound on the error in the log likelihood by Taylor-expanding to first order. The error turns out to be quite small.} Then we get
where we again assign infinite base quality to ref reads and have let $N_{\rm ref} = | \mathcal{R}|$.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have let -> let

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

\begin{align}
{\rm LOD} \approx& \sum_{j \in \mathcal{A}} \left[ \log (1 - \epsilon_j) - \log \epsilon_j \right] + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!} \\
\approx& -\sum_{j \in \mathcal{A}} \log \epsilon_j + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!},
q(z_n) &\propto \left[ \epsilon_n \overline{\ln (1 - f)} \right]^{z_n} \left[ (1 - \epsilon_n) \overline{\ln f} \right]^{1 - z_n} \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think \overline{\ln (1 - f)} and \overline{\ln f} should be in the exponent, as in e^{\overline{\ln f}} and e^{\overline{\ln f}}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

\end{equation}
Then, Equation 10.3 of Bishop gives us the variational lower bound on $L(f)$:
\begin{align}
L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entropy is the expectation of the negative log density, so maybe we should be adding entropies here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. I agree with what you have in the algorithm summary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

final double betaEntropy = Beta.logBeta(alpha, beta) - (alpha - 1)*digammaAlpha - (beta-1)*digammaBeta + (alpha + beta - 2)*digammaAlphaPlusBeta;

// TODO: check if the stream is too expensive
final double result = -betaEntropy + rho * refCount + altQuals.stream().mapToDouble(qual -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, I think we want to add entropies here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davidbenjamin davidbenjamin force-pushed the db_m2_active branch 2 times, most recently from 0c82734 to 44d7165 Compare June 5, 2018 16:41
@codecov-io
Copy link

codecov-io commented Jun 5, 2018

Codecov Report

Merging #4832 into master will increase coverage by 0.043%.
The diff coverage is 100%.

@@              Coverage Diff               @@
##             master     #4832       +/-   ##
==============================================
+ Coverage     80.35%   80.394%   +0.043%     
- Complexity    17712     17730       +18     
==============================================
  Files          1088      1088               
  Lines         63975     63989       +14     
  Branches      10313     10313               
==============================================
+ Hits          51404     51443       +39     
+ Misses         8558      8543       -15     
+ Partials       4013      4003       -10
Impacted Files Coverage Δ Complexity Δ
...hellbender/tools/walkers/mutect/Mutect2Engine.java 93.103% <100%> (+1.992%) 53 <9> (+3) ⬆️
...rg/broadinstitute/hellbender/utils/io/IOUtils.java 68.817% <0%> (-1.075%) 49% <0%> (-1%)
...adinstitute/hellbender/utils/R/RScriptLibrary.java 100% <0%> (ø) 6% <0%> (ø) ⬇️
.../hellbender/utils/python/PythonScriptExecutor.java 63.636% <0%> (ø) 10% <0%> (ø) ⬇️
...lotypecaller/readthreading/ReadThreadingGraph.java 88.608% <0%> (+0.253%) 144% <0%> (+1%) ⬆️
...dinstitute/hellbender/utils/R/RScriptExecutor.java 80.556% <0%> (+0.274%) 17% <0%> (ø) ⬇️
...ellbender/tools/walkers/vqsr/CNNScoreVariants.java 74.336% <0%> (+0.345%) 41% <0%> (ø) ⬇️
...ols/walkers/haplotypecaller/AssemblyResultSet.java 74.85% <0%> (+0.599%) 45% <0%> (+2%) ⬆️
...nstitute/hellbender/utils/read/AlignmentUtils.java 75.565% <0%> (+0.616%) 149% <0%> (+2%) ⬆️
...tute/hellbender/engine/AssemblyRegionIterator.java 87.342% <0%> (+1.266%) 24% <0%> (+1%) ⬆️
... and 6 more

@davidbenjamin
Copy link
Contributor Author

Back to you @takutosato. You caught a couple of whoppers. Fortunately, the validations still look good after fixing them.

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment but everything else looks great!

L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\
&= -H(\alpha, \beta) + \rho N_{\rm ref} + \sum_n \left[ \gamma_n (\rho + \ln \epsilon_n) + (1 - \gamma_n)(\tau + \ln(1 - \epsilon_n) - H(\gamma_n) \right],
L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] + {\rm entropy}[q(f)] + \sum_n {\rm entropy}[q(z_n)] \\
&= -H(\alpha, \beta) + N_{\rm ref} \ln \rho + \sum_n \left[ \gamma_n \ln \left( \rho \epsilon_n \right) + (1 - \gamma_n) \ln \left( \tau (1 - \epsilon_n) \right) - H(\gamma_n) \right],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entropies should have positive signs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants