-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determinant could be faster if use LU #456
Comments
Yeah, I think that makes sense ChainRules.jl/src/rulesets/LinearAlgebra/dense.jl Lines 87 to 94 in 37a9e9b
It would be one of the cases where we can benifit from changing the primal computation |
Yes that was my point. One difficulty (??) - what if the LinearAlgebra implementation changes? Unlikely here but maybe not in general. I guess ChainRules needs to keep track of such changes? But is it also a bit concerning that this strategy requires code duplication? |
The math remains the same even if implementation changes.
It's how it is. In theory not having a rule can be better, if the optimizer can inline everything and see the common subexpressions (like computing |
Thanks for the thoughts |
IIRC, we didn't use |
I think it is a little worse than that. By calling |
we can at least add the LU path for |
If you work out the rrule for computing the determinant from the LU decomposition and then compose it with the rrule for the LU decomposition, you end up with From the LU decomposition, the matrix inverse can be computed quickly, cheaply, and in-place using two applications of backwards substitution, so the only remaining question is one of stability. The matrix inverse does not exist exactly when the determinant is exactly zero. We don't currently do any special-casing for the zero-determinant case, but perhaps we should. By the subgradient convenient, the cotangent should then be the zero matrix.
This would be safe to do. Since I think |
When I take However, if
when applying the pullback would I not want to use the LU factorisation instead of the "collected" matrix? |
I wasn't able to follow. What do these terms mean? |
Forward pass:
Backward pass:
(perfectly possible I'm missing something or getting something wrong here ... I only just started to think about AD at the implementation level.) |
I'm sorry, it's still not entirely clear to me what your notation means. e.g. you seem to be using the prefix of |
I'm appying an adjoint operator - this application is denotes by |
So I'm trying to write down a concrete use-case, and at least the ones I had in mind when posting this seem to be more relevant for Re the
and for the But I appreciate the problem of a generic vs specialised implementation. I'll write something else on |
Returning to We have parameters
with D = d / dpj, Ai = A(xi,p), fi' = f'(det(Ai) - yi), DAi = DA(xi, p), then
I'm now struggling to re-order the operations to see the backpropagation. Is it simply this?
? I.e. the backpropagation would be:
And would you agree that this indicates I should use the If what I've written is correct, then there is still the issue left that "collecting" |
I now think this issue is irrelevant, will close reopen a new issue. |
I noticed the
rrule
fordet
does not use the LU factorisation. Is this intentional? Or is it implicit?EDIT: implicitly both
rrule
andfrule
do use the factorisation but it could be reused for effiicency. Forrrule
a question of numerical stability remains but not clear to me yet if resolvable?The text was updated successfully, but these errors were encountered: