-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes gradient, Jacobian, Hessian, and vjp tests #2
Conversation
Co-authored-by: Mohamed Tarek <mohamed82008@gmail.com>
Pass Total
AbstractDifferentiation.jl | 117 117 The issue within the jvp tests was that v = (rand(length(xvec)), rand(length(yvec))) was interpreted as one directional derivative, i.e., x = to_vec(xvec, yvec)
ẋ = to_vec(v)
fdm(ε -> f(x .+ ε .* ẋ), zero(eltype(x))) The fix now just augments v to |
return y * derivative(d.backend, d.f, d.xs...) | ||
end | ||
|
||
function Base.:*(d::LazyDerivative, y::Union{Number,Tuple}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should support multiplication if the derivative returns a tuple. This part of the code can be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to support functions like fder(x, y) = exp(y) * x + y * log(x)
for the LazyOperators as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I guess we could.
Base.:*(y, d::LazyGradient) = y * gradient(d.backend, d.f, d.xs...) | ||
|
||
function Base.:*(d::LazyGradient, y::Union{Number,Tuple}) | ||
if d.xs isa Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is xs
sometimes a tuple and sometimes not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to support:
AD.LazyJacobian(fdm_backend, x->fjac(x, yvec), xvec)
where xs
is then only a vector xvec
and not (xvec, )
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
src/AbstractDifferentiation.jl
Outdated
end | ||
end | ||
|
||
function Base.:*(d::LazyJacobian, ys::AbstractArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does the right thing if ys
is a matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an error in that case or fix that manually by putting a vec
? I now combined these AbstractArray
specializations with the general fallback. They were just differing by stuff like (y,)
.
src/AbstractDifferentiation.jl
Outdated
end | ||
end | ||
|
||
function Base.:*(d::LazyHessian, ys::AbstractArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this function different from the general fallback?
src/AbstractDifferentiation.jl
Outdated
end | ||
end | ||
|
||
function Base.:*(ys::AbstractArray, d::LazyHessian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we combine this with the general fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! See my comment above as well -- When I started I thought it makes sense to dispatch them all separately because of these tiny differences between Arrays, Tuples, and Numbers as input but it really leads to much more code and gets less readable..
@@ -350,6 +510,19 @@ function define_pushforward_function_and_friends(fdef) | |||
pff(cols) | |||
end | |||
end | |||
elseif eltype(identity_like) <: AbstractMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please comment this part? It's not clear to me what's happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the fix I added for the computation of the Hessian. identity_like
and cols
look as follows in that case:
identity_like = ([1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0],)
identity_like[1] = [1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0]
cols = [1.0, 0.0, 0.0, 0.0, 0.0]
cols = [0.0, 1.0, 0.0, 0.0, 0.0]
cols = [0.0, 0.0, 1.0, 0.0, 0.0]
cols = [0.0, 0.0, 0.0, 1.0, 0.0]
cols = [0.0, 0.0, 0.0, 0.0, 1.0]
so it was mainly to fix the input/output for the additional pushforward
that is used. I'll need to check in a bit more detail if one can simplify that function a bit more. IIRC without that function I got dimension errors in the jvp of FiniteDifferences.jl because it would have pushed forward matrices like identity_like[1]
instead of the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think you are right that this needs some additional care.. I think the current version will break down once we'd like to compute a tuple of Hessians.
end | ||
|
||
function test_fdm_jvp(fdm_backend) | ||
v = (rand(length(xvec)), rand(length(yvec))) | ||
pf1 = AD.pushforward_function(fdm_backend, fjac, xvec, yvec)(v) | ||
|
||
if fdm_backend isa FDMBackend2 # augmented version of v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the comment above: #2 (comment) .
The resulting error traces back to the following: When we use: v = (rand(length(xvec)), rand(length(yvec)))
with our FDMBackend2
as the vector that we'd like to pushforward through a function fvec(x,y)
that takes two inputs x
and y
, we get out an output that corresponds to the directional derivative with "v[1]
as the direction" along x and "v[2]
as the direction" along y. The other backends however interprete the v
as change x
along the direction v[1]
while keeping y
fix and change y
in the direction v[2]
while keeping x
fix.
Looks good overall. Just left a few comments because I couldn't immediately understand what's happening in a couple of places. |
Current test status:
There are basically two fixes:
x
to compute the gradients, so the value was computed wrongly.AbstractMatrix
it didn't loop over its columns but vectorized the entire matrix which lead to a dimension error.It remains to fix the jvp case.
For example, in
fdm_backend2
results in a single output vector instead of a tuple of two output vectors.