-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made Mutect2 isActive much better for low allele fractions #4832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice model!
docs/mutect/mutect.tex
Outdated
\end{equation} | ||
where the approximation amounts to ignoring the possibility that ref reads are actually alt, or, equivalently, giving each ref read infinite quality. This is not necessary but it greatly speeds the computation implementation because, as we will see, we will only need to keep alt base qualities in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computation implementation -> computation (maybe?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/mutect/mutect.tex
Outdated
\end{equation} | ||
The terms that signify observed ref reads that are actually alt reads with an error and vice versa are negligible\footnote{We can set an upper bound on the error in the log likelihood by Taylor-expanding to first order. The error turns out to be quite small.} Then we get | ||
where we again assign infinite base quality to ref reads and have let $N_{\rm ref} = | \mathcal{R}|$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have let -> let
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/mutect/mutect.tex
Outdated
\begin{align} | ||
{\rm LOD} \approx& \sum_{j \in \mathcal{A}} \left[ \log (1 - \epsilon_j) - \log \epsilon_j \right] + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!} \\ | ||
\approx& -\sum_{j \in \mathcal{A}} \log \epsilon_j + \log \frac{|\mathcal{R}|! |\mathcal{A}|!}{(|\mathcal{R}|+|\mathcal{A}|+1)!}, | ||
q(z_n) &\propto \left[ \epsilon_n \overline{\ln (1 - f)} \right]^{z_n} \left[ (1 - \epsilon_n) \overline{\ln f} \right]^{1 - z_n} \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think \overline{\ln (1 - f)}
and \overline{\ln f}
should be in the exponent, as in e^{\overline{\ln f}}
and e^{\overline{\ln f}}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
docs/mutect/mutect.tex
Outdated
\end{equation} | ||
Then, Equation 10.3 of Bishop gives us the variational lower bound on $L(f)$: | ||
\begin{align} | ||
L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entropy is the expectation of the negative log density, so maybe we should be adding entropies here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. I agree with what you have in the algorithm summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
final double betaEntropy = Beta.logBeta(alpha, beta) - (alpha - 1)*digammaAlpha - (beta-1)*digammaBeta + (alpha + beta - 2)*digammaAlphaPlusBeta; | ||
|
||
// TODO: check if the stream is too expensive | ||
final double result = -betaEntropy + rho * refCount + altQuals.stream().mapToDouble(qual -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, I think we want to add entropies here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
0c82734
to
44d7165
Compare
Codecov Report
@@ Coverage Diff @@
## master #4832 +/- ##
==============================================
+ Coverage 80.35% 80.394% +0.043%
- Complexity 17712 17730 +18
==============================================
Files 1088 1088
Lines 63975 63989 +14
Branches 10313 10313
==============================================
+ Hits 51404 51443 +39
+ Misses 8558 8543 -15
+ Partials 4013 4003 -10
|
83dcb75
to
aeb961d
Compare
aeb961d
to
ffe2be2
Compare
Back to you @takutosato. You caught a couple of whoppers. Fortunately, the validations still look good after fixing them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment but everything else looks great!
docs/mutect/mutect.tex
Outdated
L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] - {\rm entropy}[q(f)] - \sum_n {\rm entropy}[q(z_n)] \\ | ||
&= -H(\alpha, \beta) + \rho N_{\rm ref} + \sum_n \left[ \gamma_n (\rho + \ln \epsilon_n) + (1 - \gamma_n)(\tau + \ln(1 - \epsilon_n) - H(\gamma_n) \right], | ||
L(f) &\approx E_q \left[ \ln P({\rm reads}, f, \vz) \right] + {\rm entropy}[q(f)] + \sum_n {\rm entropy}[q(z_n)] \\ | ||
&= -H(\alpha, \beta) + N_{\rm ref} \ln \rho + \sum_n \left[ \gamma_n \ln \left( \rho \epsilon_n \right) + (1 - \gamma_n) \ln \left( \tau (1 - \epsilon_n) \right) - H(\gamma_n) \right], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entropies should have positive signs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@Takuto This is the change to improve high-depth sensitivity for mitochondria.
@meganshand @fleharty You are welcome to review as well if you'd like.