[WIP] Abstract differentiation interface #1

mohamed82008 · 2021-02-08T16:32:40Z

In this PR, I implement a high level API for differentiation. The idea is to unify the APIs of all the AD packages we have in the Julia ecosystem. This should enable AD users to write backend-agnostic code using only the API from AbstractDifferentiation.

In the current implementation, AD package authors would need to define at least the following:

A backend struct e.g. PackageBackend for the package that subtypes AbstractBackend
jacobian(ab::PackageBackend, f, xs...): returns the Jacobian of the output(s) of f wrt its inputs at xs.
primalvalue(x) (not needed for finite difference or source to source): returns the primal value of x. x can be a dual number, vector of duals, tracked array, etc.

By defining the above, the following functions are all then automatically defined:

derivative(::AbstractBackend, f, xs...): returns the derivatives of the scalar-valued function f wrt its inputs at xs where xs are all scalars.
gradient(ab::AbstractBackend, f, xs...): returns the gradient of the scalar-valued function f wrt its inputs at xs where xs can be anything that the backend ab supports.
hessian(ab::AbstractBackend, f, xs...): returns the Hessian of the scalar-valued function f wrt its inputs at xs.
value_and_derivative(::AbstractBackend, f, xs...): returns the output value of the function f as well as its derivatives wrt its inputs at xs.
value_and_gradient(::AbstractBackend, f, xs...): returns the output value of the function f as well as its gradients wrt its inputs at xs.
value_and_jacobian(::AbstractBackend, f, xs...): returns the output value of the function f as well as its Jacobians wrt its inputs at xs.
value_and_hessian(ab::AbstractBackend, f, xs...): returns the output value of the function f as well as its Hessian wrt its inputs at xs.
value_gradient_and_hessian(ab::AbstractBackend, f, xs...): returns the output value of the function f as well as its gradients and Hessians wrt its inputs at xs.
pullback_function(::AbstractBackend, f, xs...): returns the pullback function of f at xs .
pushforward_function(::AbstractBackend, f, xs...): returns the pushforward function of f at xs.
value_and_pullback_function(::AbstractBackend, f, xs...): returns a function that takes as input the differential of f and returns the primal value of f at xs and the pullback of the differential.
value_and_pushforward_function(::AbstractBackend, f, xs...): returns a function that takes as input the tangents of the inputs xs and returns the primal value of f at xs and the pushforward of the tangents.
Lazy Jacobian and Jacobian transpose vector/matrix multiplication.
Lazy Hessian and Hessian transpose vector/matrix multiplication.

A package author can choose to define any of the above automatically defined functions for his/her package in the following cases:

The default implementation is not efficient enough. For example, the pushforward's and pullback's default implementations using jacobian incur some additional arithmetic required for the encoding of both of these functions as Jacobians. A few savings can be made by defining the method for the backend directly.
To avoid control flow. The value_and versions of the functions uses control flow to avoid querying the primal value more than once when the function is called multiple times, e.g. when calculating the gradient of a multivariate function with forward-mode in chunks.

I tried to keep the restrictions minimal in my implementation. Looking forward to your feedback!

The main remaining items to do here are:

Test the hessian functions
Test the lazy operators
Write documentation

ChrisRackauckas · 2021-02-08T17:22:27Z

src/AbstractDifferentiation.jl

+struct HigherOrderBackend{B} <: AbstractBackend
+    backends::B
+end
+reduceorder(b::AbstractBackend) = b
+function reduceorder(b::HigherOrderBackend)
+    return HigherOrderBackend(reverse(Base.tail(reverse(b.backends))))
+end
+lowest(b::AbstractBackend) = b
+lowest(b::HigherOrderBackend) = b.backends[end]
+secondlowest(b::HigherOrderBackend) = lowest(reduceorder(b))


I don't get this part

It's an over-complicated way to get b.backends[end-1]. I was trying to be generic but I don't think generic helps here.

The lowest-level backend is b.backends[end]. The second lowest is b.backends[end-1]. In forward-over-reverse, the lowest is reverse and the second lowest is forward.

willtebbutt · 2021-02-09T12:49:29Z

~~This all looks great. Any chance we could go with pushforward instead of pushforward_function etc?~~

Please ignore this.

willtebbutt · 2021-02-09T12:57:32Z

I'm still unclear on why the primitive that everything is implemented in terms of is jacobian, rather than something involving pushforward / pullback.

It seems to me that the way ADs are going to wind up implementing this interface is by defining jacobian in terms of evaluations of their native pushforwards (forwards-mode) or pullbacks (reverse-mode). Then this package defines its version of pushforwards / pullbacks in terms of jacobian. Does this not seem backwards?

Is there a reason not to

require that an AD implement some variant on pushforward / pullback, depending on its mode.
implement jacobian in terms of those?

mohamed82008 · 2021-02-09T15:49:48Z

Is there a reason not to

require that an AD implement some variant on pushforward / pullback, depending on its mode.

implement jacobian in terms of those?

No. I will write macros that let you define any one of the three and get the other 2 for free.

mohamed82008 · 2021-02-09T17:10:23Z

@willtebbutt how would you define the Jacobian of a multi-input function using the jvp? What do you pushforward?

mohamed82008 · 2021-02-09T17:22:09Z

So it's not clear to me how to define the jacobian function from jvp or j'vp without committing to a representation for the differential. The best I can think of is have the users define an identity_like function for the arguments or outputs to pushforward or pullback.

willtebbutt · 2021-02-09T18:27:00Z

@willtebbutt how would you define the Jacobian of a multi-input function using the jvp? What do you pushforward?

To be honest I don't even know how to define the Jacobian in this context, let alone how to construct it using a pushforward. Do you have thoughts on how this should be done?

mohamed82008 · 2021-02-09T18:27:46Z

So it seems that generically defining a jacobian using the pushforward_function or pullback_function is more difficult than the other way around. Essentially you have to commit to a certain representation for the tangents or cotangents. Does your pushforward/pullback support multiple tangents/cotangents? Are the multiple tangents a vector of vectors or a matrix? These questions can have different answers in different packages. And I don't want to make one the default. So I am left with my initial design which is the jacobian is the only primitive. The nice thing about this is that the pushforward and pullback now come for free so long as:

dot is defined for the output of the function and its cotangent representation.
+ and * are defined for the input to the function and the tangent representation.

These assumptions are representation-agnostic. They just assume that some functions are defined.

For specific AD packages that want to commit to a specific tangent or cotangent representation, they can define pushforward_function or pullback_function as a primitive and then define jacobian in terms of that. But this belongs in the AD package not here imo.

willtebbutt · 2021-02-09T18:35:50Z

@mohamed82008 it's still not clear to me that we've figured out how to define the Jacobian in the first place.

Let's forget about jvps and vjps for the time being, how are you proposing to define the Jacobian of some function f: A -> B, where neither A nor B subtype Vector{<:Real}?

mohamed82008 · 2021-02-09T18:45:40Z

Do you have thoughts on how this should be done?

I take the gradient case as a reference. So if we return a tuple for the gradient of a scalar-valued function with multiple arguments, then a tuple of Jacobians makes sense for vector-valued functions. Similarly for single-input, multi-output functions, a tuple can be returned but it means something different. The complicated case is the multi-input, multi-output case because you need to consider all combinations. So it's not enough to define the differential of a struct, we need a type for the derivative of one struct wrt the other.

But even for a single input, single output function, do we pass in a vector of 1-hot tangent vectors or an identity matrix to the pushforward? Ideally both should be supported but I am afraid some packages or adjoint rules may only work with the vector of vectors case or the matrix case and converting between representations is not something that I think belongs here, simply because the derivative representation problem isn't tackled here at all.

Let's forget about jvps and vjps for the time being

Hmm this is tempting but the current implementation already works for multiple array-like inputs, single array-like output functions out of the box with mild assumptions. But I imagine there is little use to these functions anyways outside the context of AD implementation. Most people just need derivatives, gradients, jacobians and hessians.

how are you proposing to define the Jacobian of some function f: A -> B, where neither A nor B subtype Vector{<:Real}?

I am not proposing any! I think this is an interesting problem to solve in ChainRulesCore perhaps where differential types are defined. I suspect something like https://github.com/jonniedie/ComponentArrays.jl may come in handy.

mohamed82008 · 2021-02-09T18:54:02Z

As an aside personally, I think the best representation is a good old matrix! Let's agree to vectorize all the inputs and all the outputs always and have a decoder that decodes each element to the derivative it represents. Then you can query this special matrix in different ways and get different differential structs out of it.

mohamed82008 · 2021-02-09T18:55:01Z

This separates the representation problem from the AD problem. Both are interesting but mixing them is a nightmare.

mohamed82008 · 2021-02-09T18:57:28Z

I will go ahead and test the current implementation with the most common high level use cases for all the common AD packages. If tests pass, I think we can merge and release and then revisit later if we come up with a better design.

mohamed82008 · 2021-02-09T19:02:02Z

I think the package is useful enough even if it only supports number and array inputs and outputs (single output) which is like 90% of the AD use cases out there.

willtebbutt · 2021-02-09T20:17:25Z

I think the package is useful enough even if it only supports number and array inputs and outputs (single output)

This makes sense to me. We know how to implement this is in terms of the ADs we have using jvps and vjps and I agree that it's probably useful.

mohamed82008 · 2021-02-10T12:40:44Z

I may have figured out a nice-ish solution. This got second tiered though on my priority list so I will get back to this some time next week.

oxinabox · 2021-02-10T12:42:41Z

I will have time to review this next week, hopefully

mohamed82008 · 2021-02-10T17:02:32Z

I pushed what I have. It's not fully functional yet. Until later.

oschulz · 2021-03-15T15:27:41Z

Thanks for this initiative! I was looking for a package providing a common AD API exactly like this.

As far as

Lazy Jacobian and Jacobian transpose vector/matrix multiplication.
Lazy Hessian and Hessian transpose vector/matrix multiplication.

are concerned - a nice way to handle this might be an additional package ADLinearMaps.jl, based on both AbstractDifferentiation.jl (pushforward_function and pullback_function) and LinearMaps.jl. We're currently using LinearMaps.jl in MGVI.jl as a common API for JVP and VJP, it feels very natural.

CarloLucibello · 2021-03-15T17:18:36Z

Would make sense for this to be part of ChainRulesCore?

oxinabox · 2021-03-15T19:11:31Z

We've talked about it.
Not today. Maybe one day.
We're not blocking each other over it.

They are kind of opposite ends of the abstraction stack.

ChainRulesCore will soon get an abstraction (currently penselled in as configurable rules) that will let it do things like call back into AD.
JuliaDiff/ChainRulesCore.jl#68
The AD system will need to provide ChainRules with one of those (or settle for the default, which is what we have not, and which can't call back into AD)

Having one of these (beyond the default) gives ability to do value_and_directional_derivative (frule), and value_and_pullback_function (rrule).
Those are the things ChainRulesCore needs in order to be able to write rules for map etc.

Having those is also enough to be able to implement everything in this API.
(The converse is not quite true as ChainRules' configured rules also need to have traits about mutation support and some other things)

oschulz · 2021-03-15T19:14:17Z

They are kind of opposite ends of the abstraction stack.

Also, from what I understand, ForwardDiff at least will not adopt ChainRulesCore any time soon (if ever), right? But maybe it could support AbstractDifferentiation.jl?

mohamed82008 · 2021-03-15T19:24:58Z

But maybe it could support AbstractDifferentiation.jl?

Yes. The main users of AbstractDifferentiation will be users of AD. The main users of ChainRulesCore are developers of AD packages. So they are at 2 different levels of abstraction as Lyndon said.

oschulz · 2021-03-15T23:09:54Z

What about packages like FiniteDiff.jl and FiniteDifferences.jl? It's not automatic differentiation (but numerical differentiation) strictly speaking, but in contexts where AD is not possibly (e.g. because one has to call external code) and the number of dims is not too high, it would be very useful be be able to use them via the AbstractDifferentiation.jl interface, right?

mohamed82008 · 2021-03-15T23:45:56Z

it would be very useful be be able to use them via the AbstractDifferentiation.jl interface, right?

Right. All the tests in this PR so far are using finite difference. So they are definitely in scope.

mohamed82008 · 2021-05-08T15:57:21Z

Looks like this PR fell into the black hole of forgotten PRs. @frankschae has been secretly working on fixing the errors here though in his fork. We should see more activity here soon. Would be nice to get some attention from potential reviewers in the coming 1-2 weeks.

Co-authored-by: Mohamed Tarek <mohamed82008@gmail.com>

Fixes gradient, Jacobian, Hessian, and vjp tests

add ForwardDiff and Zygote

proof of concept

28ff9a5

mohamed82008 changed the title ~~[WIP] Abstract interface implementation~~ [WIP] Abstract differentiation interface Feb 8, 2021

fix typo

dfc5911

ChrisRackauckas reviewed Feb 8, 2021

View reviewed changes

primitive macro

da5aece

fix gradients

512fd2d

frankschae added 3 commits May 3, 2021 21:55

fix Jacobian tests

caa72e1

fix hessian tests

4a763e3

fix j′vp

b6f4804

frankschae and others added 8 commits May 9, 2021 17:37

Update src/AbstractDifferentiation.jl

5ed5c4e

Co-authored-by: Mohamed Tarek <mohamed82008@gmail.com>

fix vjp tests

2991b8a

fix jvp tests

ddf2863

Merge remote-tracking branch 'frankschae/mt/interface' into mt/interface

4269f83

lazy derivative fixes

959fa01

lazy gradient

5102f01

lazy jacobian

6941f5c

lazy hessian tests

c744f92

ChrisRackauckas mentioned this pull request Jun 9, 2021

Reverse differentiation through nlsolve JuliaNLSolvers/NLsolve.jl#205

Open

frankschae and others added 13 commits June 29, 2021 23:42

combine general fallback with abstract array

5c0a147

reshape gradient and fix for AD.hessian

48af108

remove prints

a9f1538

add ForwardDiff

60ef452

Merge pull request #2 from frankschae/mt/interface

347d978

Fixes gradient, Jacobian, Hessian, and vjp tests

ForwardDiff and test updates

c0c489a

add Zygote tests

0135125

fix ForwardDiff

28da613

use Higher order backend

9d0f27e

comment failing test.

eca9fda

xs ... -> x for Hessian input

96b1a26

remove unnecessary tuple conversion

c384a1b

Merge pull request #3 from frankschae/ForwardDiff

9484b4e

add ForwardDiff and Zygote

mohamed82008 merged commit 5a21414 into master Aug 28, 2021

gdalle deleted the mt/interface branch December 21, 2023 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Abstract differentiation interface #1

[WIP] Abstract differentiation interface #1

mohamed82008 commented Feb 8, 2021 •

edited

Loading

ChrisRackauckas Feb 8, 2021

mohamed82008 Feb 8, 2021

mohamed82008 Feb 8, 2021

willtebbutt commented Feb 9, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

willtebbutt commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021

willtebbutt commented Feb 9, 2021

mohamed82008 commented Feb 10, 2021

oxinabox commented Feb 10, 2021

mohamed82008 commented Feb 10, 2021

oschulz commented Mar 15, 2021

CarloLucibello commented Mar 15, 2021

oxinabox commented Mar 15, 2021

oschulz commented Mar 15, 2021

mohamed82008 commented Mar 15, 2021

oschulz commented Mar 15, 2021

mohamed82008 commented Mar 15, 2021

mohamed82008 commented May 8, 2021

[WIP] Abstract differentiation interface #1

[WIP] Abstract differentiation interface #1

Conversation

mohamed82008 commented Feb 8, 2021 • edited Loading

ChrisRackauckas Feb 8, 2021

Choose a reason for hiding this comment

mohamed82008 Feb 8, 2021

Choose a reason for hiding this comment

mohamed82008 Feb 8, 2021

Choose a reason for hiding this comment

willtebbutt commented Feb 9, 2021 • edited Loading

willtebbutt commented Feb 9, 2021 • edited Loading

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

willtebbutt commented Feb 9, 2021 • edited Loading

mohamed82008 commented Feb 9, 2021 • edited Loading

willtebbutt commented Feb 9, 2021 • edited Loading

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021

mohamed82008 commented Feb 9, 2021 • edited Loading

mohamed82008 commented Feb 9, 2021

willtebbutt commented Feb 9, 2021

mohamed82008 commented Feb 10, 2021

oxinabox commented Feb 10, 2021

mohamed82008 commented Feb 10, 2021

oschulz commented Mar 15, 2021

CarloLucibello commented Mar 15, 2021

oxinabox commented Mar 15, 2021

oschulz commented Mar 15, 2021

mohamed82008 commented Mar 15, 2021

oschulz commented Mar 15, 2021

mohamed82008 commented Mar 15, 2021

mohamed82008 commented May 8, 2021

mohamed82008 commented Feb 8, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021 •

edited

Loading

willtebbutt commented Feb 9, 2021 •

edited

Loading

mohamed82008 commented Feb 9, 2021 •

edited

Loading