-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Abstract differentiation interface #1
Conversation
src/AbstractDifferentiation.jl
Outdated
struct HigherOrderBackend{B} <: AbstractBackend | ||
backends::B | ||
end | ||
reduceorder(b::AbstractBackend) = b | ||
function reduceorder(b::HigherOrderBackend) | ||
return HigherOrderBackend(reverse(Base.tail(reverse(b.backends)))) | ||
end | ||
lowest(b::AbstractBackend) = b | ||
lowest(b::HigherOrderBackend) = b.backends[end] | ||
secondlowest(b::HigherOrderBackend) = lowest(reduceorder(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an over-complicated way to get b.backends[end-1]
. I was trying to be generic but I don't think generic helps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowest-level backend is b.backends[end]
. The second lowest is b.backends[end-1]
. In forward-over-reverse, the lowest is reverse and the second lowest is forward.
Please ignore this. |
I'm still unclear on why the primitive that everything is implemented in terms of is It seems to me that the way ADs are going to wind up implementing this interface is by defining Is there a reason not to
|
No. I will write macros that let you define any one of the three and get the other 2 for free. |
@willtebbutt how would you define the Jacobian of a multi-input function using the jvp? What do you pushforward? |
So it's not clear to me how to define the jacobian function from jvp or j'vp without committing to a representation for the differential. The best I can think of is have the users define an |
To be honest I don't even know how to define the Jacobian in this context, let alone how to construct it using a pushforward. Do you have thoughts on how this should be done? |
So it seems that generically defining a
These assumptions are representation-agnostic. They just assume that some functions are defined. For specific AD packages that want to commit to a specific tangent or cotangent representation, they can define |
@mohamed82008 it's still not clear to me that we've figured out how to define the Jacobian in the first place. Let's forget about |
I take the gradient case as a reference. So if we return a tuple for the gradient of a scalar-valued function with multiple arguments, then a tuple of Jacobians makes sense for vector-valued functions. Similarly for single-input, multi-output functions, a tuple can be returned but it means something different. The complicated case is the multi-input, multi-output case because you need to consider all combinations. So it's not enough to define the differential of a struct, we need a type for the derivative of one struct wrt the other. But even for a single input, single output function, do we pass in a vector of 1-hot tangent vectors or an identity matrix to the pushforward? Ideally both should be supported but I am afraid some packages or adjoint rules may only work with the vector of vectors case or the matrix case and converting between representations is not something that I think belongs here, simply because the derivative representation problem isn't tackled here at all.
Hmm this is tempting but the current implementation already works for multiple array-like inputs, single array-like output functions out of the box with mild assumptions. But I imagine there is little use to these functions anyways outside the context of AD implementation. Most people just need derivatives, gradients, jacobians and hessians.
I am not proposing any! I think this is an interesting problem to solve in ChainRulesCore perhaps where differential types are defined. I suspect something like https://github.com/jonniedie/ComponentArrays.jl may come in handy. |
As an aside personally, I think the best representation is a good old matrix! Let's agree to vectorize all the inputs and all the outputs always and have a decoder that decodes each element to the derivative it represents. Then you can query this special matrix in different ways and get different differential structs out of it. |
This separates the representation problem from the AD problem. Both are interesting but mixing them is a nightmare. |
I will go ahead and test the current implementation with the most common high level use cases for all the common AD packages. If tests pass, I think we can merge and release and then revisit later if we come up with a better design. |
I think the package is useful enough even if it only supports number and array inputs and outputs (single output) which is like 90% of the AD use cases out there. |
This makes sense to me. We know how to implement this is in terms of the ADs we have using |
I may have figured out a nice-ish solution. This got second tiered though on my priority list so I will get back to this some time next week. |
I will have time to review this next week, hopefully |
I pushed what I have. It's not fully functional yet. Until later. |
Thanks for this initiative! I was looking for a package providing a common AD API exactly like this. As far as
are concerned - a nice way to handle this might be an additional package ADLinearMaps.jl, based on both AbstractDifferentiation.jl ( |
Would make sense for this to be part of ChainRulesCore? |
We've talked about it. They are kind of opposite ends of the abstraction stack. ChainRulesCore will soon get an abstraction (currently penselled in as configurable rules) that will let it do things like call back into AD. Having one of these (beyond the default) gives ability to do Having those is also enough to be able to implement everything in this API. |
Also, from what I understand, ForwardDiff at least will not adopt ChainRulesCore any time soon (if ever), right? But maybe it could support AbstractDifferentiation.jl? |
Yes. The main users of AbstractDifferentiation will be users of AD. The main users of ChainRulesCore are developers of AD packages. So they are at 2 different levels of abstraction as Lyndon said. |
What about packages like FiniteDiff.jl and FiniteDifferences.jl? It's not automatic differentiation (but numerical differentiation) strictly speaking, but in contexts where AD is not possibly (e.g. because one has to call external code) and the number of dims is not too high, it would be very useful be be able to use them via the AbstractDifferentiation.jl interface, right? |
Right. All the tests in this PR so far are using finite difference. So they are definitely in scope. |
Looks like this PR fell into the black hole of forgotten PRs. @frankschae has been secretly working on fixing the errors here though in his fork. We should see more activity here soon. Would be nice to get some attention from potential reviewers in the coming 1-2 weeks. |
Co-authored-by: Mohamed Tarek <mohamed82008@gmail.com>
Fixes gradient, Jacobian, Hessian, and vjp tests
add ForwardDiff and Zygote
In this PR, I implement a high level API for differentiation. The idea is to unify the APIs of all the AD packages we have in the Julia ecosystem. This should enable AD users to write backend-agnostic code using only the API from
AbstractDifferentiation
.In the current implementation, AD package authors would need to define at least the following:
PackageBackend
for the package that subtypesAbstractBackend
jacobian(ab::PackageBackend, f, xs...)
: returns the Jacobian of the output(s) off
wrt its inputs atxs
.primalvalue(x)
(not needed for finite difference or source to source): returns the primal value ofx
.x
can be a dual number, vector of duals, tracked array, etc.By defining the above, the following functions are all then automatically defined:
derivative(::AbstractBackend, f, xs...)
: returns the derivatives of the scalar-valued functionf
wrt its inputs atxs
wherexs
are all scalars.gradient(ab::AbstractBackend, f, xs...)
: returns the gradient of the scalar-valued functionf
wrt its inputs atxs
wherexs
can be anything that the backendab
supports.hessian(ab::AbstractBackend, f, xs...)
: returns the Hessian of the scalar-valued functionf
wrt its inputs atxs
.value_and_derivative(::AbstractBackend, f, xs...)
: returns the output value of the functionf
as well as its derivatives wrt its inputs atxs
.value_and_gradient(::AbstractBackend, f, xs...)
: returns the output value of the functionf
as well as its gradients wrt its inputs atxs
.value_and_jacobian(::AbstractBackend, f, xs...)
: returns the output value of the functionf
as well as its Jacobians wrt its inputs atxs
.value_and_hessian(ab::AbstractBackend, f, xs...)
: returns the output value of the functionf
as well as its Hessian wrt its inputs atxs
.value_gradient_and_hessian(ab::AbstractBackend, f, xs...)
: returns the output value of the functionf
as well as its gradients and Hessians wrt its inputs atxs
.pullback_function(::AbstractBackend, f, xs...)
: returns the pullback function off
atxs
.pushforward_function(::AbstractBackend, f, xs...)
: returns the pushforward function off
atxs
.value_and_pullback_function(::AbstractBackend, f, xs...)
: returns a function that takes as input the differential off
and returns the primal value off
atxs
and the pullback of the differential.value_and_pushforward_function(::AbstractBackend, f, xs...)
: returns a function that takes as input the tangents of the inputsxs
and returns the primal value off
atxs
and the pushforward of the tangents.A package author can choose to define any of the above automatically defined functions for his/her package in the following cases:
pushforward
's andpullback
's default implementations usingjacobian
incur some additional arithmetic required for the encoding of both of these functions as Jacobians. A few savings can be made by defining the method for the backend directly.value_and
versions of the functions uses control flow to avoid querying the primal value more than once when the function is called multiple times, e.g. when calculating the gradient of a multivariate function with forward-mode in chunks.I tried to keep the restrictions minimal in my implementation. Looking forward to your feedback!
The main remaining items to do here are: