Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes gradient, Jacobian, Hessian, and vjp tests #2

Merged
merged 15 commits into from
Aug 2, 2021

Conversation

frankschae
Copy link
Contributor

Current test status:

Test Summary:              | Pass  Fail  Error  Total
AbstractDifferentiation.jl |  103     1      1    105
  FiniteDifferences        |  103     1      1    105
    Derivative             |   12                  12
    Gradient               |   18                  18
    Jacobian               |   18                  18
    Hessian                |   30                  30
    jvp                    |    7     1      1      9
    j′vp                   |   18                  18

There are basically two fixes:

  • FiniteDifferences mutates the input x to compute the gradients, so the value was computed wrongly.
  • when the identity was an AbstractMatrix it didn't loop over its columns but vectorized the entire matrix which lead to a dimension error.

It remains to fix the jvp case.

test_fdm_jvp(fdm_backend1) # passes
test_fdm_jvp(fdm_backend2) # fails
test_fdm_jvp(fdm_backend3) # fails

For example, in fdm_backend2

v = (rand(length(xvec)), rand(length(yvec)))
pf1 = AD.pushforward_function(fdm_backend2, fjac, xvec, yvec)(v) 

results in a single output vector instead of a tuple of two output vectors.

@frankschae
Copy link
Contributor Author

Pass  Total
AbstractDifferentiation.jl |  117    117

The issue within the jvp tests was that

v = (rand(length(xvec)), rand(length(yvec)))

was interpreted as one directional derivative, i.e.,

x = to_vec(xvec, yvec)
ẋ = to_vec(v)
fdm-> f(x .+ ε .* ẋ), zero(eltype(x)))

The fix now just augments v to ((v[1], zero(v[1])), (zero(v[2]), v[2]).

return y * derivative(d.backend, d.f, d.xs...)
end

function Base.:*(d::LazyDerivative, y::Union{Number,Tuple})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support multiplication if the derivative returns a tuple. This part of the code can be simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to support functions like fder(x, y) = exp(y) * x + y * log(x) for the LazyOperators as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess we could.

Base.:*(y, d::LazyGradient) = y * gradient(d.backend, d.f, d.xs...)

function Base.:*(d::LazyGradient, y::Union{Number,Tuple})
if d.xs isa Tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is xs sometimes a tuple and sometimes not?

Copy link
Contributor Author

@frankschae frankschae Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to support:
AD.LazyJacobian(fdm_backend, x->fjac(x, yvec), xvec)
where xs is then only a vector xvec and not (xvec, ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

end
end

function Base.:*(d::LazyJacobian, ys::AbstractArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does the right thing if ys is a matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw an error in that case or fix that manually by putting a vec? I now combined these AbstractArray specializations with the general fallback. They were just differing by stuff like (y,).

end
end

function Base.:*(d::LazyHessian, ys::AbstractArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this function different from the general fallback?

end
end

function Base.:*(ys::AbstractArray, d::LazyHessian)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this with the general fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! See my comment above as well -- When I started I thought it makes sense to dispatch them all separately because of these tiny differences between Arrays, Tuples, and Numbers as input but it really leads to much more code and gets less readable..

@@ -350,6 +510,19 @@ function define_pushforward_function_and_friends(fdef)
pff(cols)
end
end
elseif eltype(identity_like) <: AbstractMatrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please comment this part? It's not clear to me what's happening here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the fix I added for the computation of the Hessian. identity_like and cols look as follows in that case:

identity_like = ([1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0],)
identity_like[1] = [1.0 0.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0 0.0; 0.0 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 0.0 1.0]
cols = [1.0, 0.0, 0.0, 0.0, 0.0]
cols = [0.0, 1.0, 0.0, 0.0, 0.0]
cols = [0.0, 0.0, 1.0, 0.0, 0.0]
cols = [0.0, 0.0, 0.0, 1.0, 0.0]
cols = [0.0, 0.0, 0.0, 0.0, 1.0]

so it was mainly to fix the input/output for the additional pushforward that is used. I'll need to check in a bit more detail if one can simplify that function a bit more. IIRC without that function I got dimension errors in the jvp of FiniteDifferences.jl because it would have pushed forward matrices like identity_like[1] instead of the columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think you are right that this needs some additional care.. I think the current version will break down once we'd like to compute a tuple of Hessians.

end

function test_fdm_jvp(fdm_backend)
v = (rand(length(xvec)), rand(length(yvec)))
pf1 = AD.pushforward_function(fdm_backend, fjac, xvec, yvec)(v)

if fdm_backend isa FDMBackend2 # augmented version of v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the comment above: #2 (comment) .

The resulting error traces back to the following: When we use: v = (rand(length(xvec)), rand(length(yvec))) with our FDMBackend2 as the vector that we'd like to pushforward through a function fvec(x,y) that takes two inputs x and y, we get out an output that corresponds to the directional derivative with "v[1] as the direction" along x and "v[2] as the direction" along y. The other backends however interprete the v as change x along the direction v[1] while keeping y fix and change y in the direction v[2] while keeping x fix.

@mohamed82008
Copy link
Member

Looks good overall. Just left a few comments because I couldn't immediately understand what's happening in a couple of places.

@mohamed82008 mohamed82008 merged commit 347d978 into JuliaDiff:mt/interface Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants