Can we get rid of `Thunk`? #18

oxinabox · 2019-08-07T14:31:19Z

@jrevels: @willtebbutt and I were going through the Differentials to make sure we actually know what they are for.
And started to wonder if we need them.

Each one we get rid of simplifies things a lot,
especially when it comes to #16

I think we might be able to just have
Wirtinger, One, Zero, and DNE.

Wirtinger

I have only the barest understanding of what this is.
It effectively seems like a particularly convient way to deal with
deriviatives with respect complex number (In contrast to handling them as structs to (#4))
Probably useful.

DNE

Does not exist. Obviously useful.

One, Zero

Useful identities that are evaluated lazily, and can thus be removed from the chain efficiently.

Casted

It is kind of the generalization of One, and Zero.
(in that One() could also be written Casted(true) etc).
It lets us lazily delay computing a broadcast,
so that it can be fused later.
But I think in the short term we can simplify the code
by replacing say
Rule((Δx, Δy) -> sum(Δx * cast(y)) + sum(cast(x) * Δy))
with
Rule((Δx, Δy) -> sum(Δx .*y) + sum(x .* Δy)
(from here
which for that particular case would even be identical in performance I think.
Since it does not end up returning any kind of lazy computation.
And later we can try getting back the lazy computation and broadcast fusing by returning broadcasted.

Getting rid of Casted would solve #10

Thunk

Thunk seemed really useful at first,
but I am not sure anymore that it actually does anything.

A thunk is basically wrapping a function returning Differentiable f(v) in a ()->f(v)
so as not to have to compute it yet.
But Any time you interact with it (e.g. via add or mul) it gets externed,
because if you don't do that you can get huge chains of thunks that call thunks,
and also because at the time you are called e.g. add you probably do actually want the value -- your not going to skip it and only use the other part.

And the using it inside a rule isn't actually making anything extra deferred until the backwards pass, since rules themselfs are deffered until backward pass.

E.g. lookinng at this rule
Rather than

function rrule(::typeof(inv), x::AbstractArray)
    Ω = inv(x)
    m = @thunk(-Ω')
    return Ω, Rule(ΔΩ -> m * ΔΩ * Ω')
end

we could just do

function rrule(::typeof(inv), x::AbstractArray)
    Ω = inv(x)
    return Ω, Rule(ΔΩ -> -Ω' * ΔΩ * Ω')
end

Which boils down to the same thing since it when the rule is invoked it gets externed anyway. by * becoming mul.

Even in the case of the derivative for multiple things, so you would have multiple rules referencing the thunk, it still doesn't change anything since thunks don't cache
(#7).
I recall @jrevels saying that they used to cache, so maybe still having them is a legacy of that time and we just didn't notice that they didn't do anything anymore.

The text was updated successfully, but these errors were encountered:

jrevels · 2019-08-07T18:14:29Z

#10 is already this issue for Casted. Seems like the OP here is really just asking if Thunk should go away? I'll change the title to reflect that.

Long story short, the feature that is implemented via Thunk is definitely useful; even if Thunk goes away, that feature still needs to be implemented somewhere. Let's dive into that feature; the example you posted isn't a very good example usage of @thunk.

Imagine a simple forward-mode AD using ChainRules. Let's say it is getting the frule for the function f(x₁, x₂), which we know (by construction) has the form:

Rule((Δx₁, Δx₂) -> ∂f_∂x₁() * Δx₁ + ∂f_∂x₂() * Δx₂)

Let's say our AD is only differentiating w.r.t. x₁; in that case, calling ∂f_∂x₂() is pointless and wasteful, since Δx₂ will be Zero anyway. This, then, is why Thunk exists: so that these kinds of opportunities can be exploited automatically without the AD tool needing to do clever stuff. In an ideal world, a compiler would be able to figure all this out, but alas, there's lots of cases in AD where us primitive authors happen to know some action is @thunkable even when the compiler can't easily (or possibly) prove it (e.g. when the function is "pure" mathematically, but it's implementation isn't pure programmatically).

Note that this type of situation isn't that rare at the very entry point of the target code (when the user explicitly says "hey I only care about this partial derivative for my function"). More importantly, I'd say it's downright common to encounter this implicitly when you're actually computing intermediate derivatives within the target code being differentiated.

Note also that there is an exactly equivalent situation for reverse-mode (just w.r.t. outputs instead of inputs).

So here's the cool part: ideally, just given partial derivatives, ChainRules would be able to autogenerate rules for you where @thunk is already in the appropriate place. In fact, this functionality is already implemented for scalars via @scalar_rule (it adds the @thunks automatically for you). Why not also add a @tensor_rule to get the same benefit for array primitives? The answer is that to implement this correctly, you actually need extra indexing information; in fact, this was the point of that Ricci Calculus paper I kept bringing up. Really hope that kind of thing gets implemented here one day 🙂

Hopefully all that helps; the thunk stuff is one of the many things that never got properly documented, since I never got around to writing an actual manual as opposed to docstrings.

jrevels · 2019-08-07T18:23:11Z

But Any time you interact with it (e.g. via add or mul) it gets externed,

(just to clarify, since this is the crux of my earlier example and I didn't make it explicit: mul(::Thunk, ::Zero) should just return Zero() and not extern the Thunk)

oxinabox · 2019-08-07T18:40:42Z

right Thanks that clears things up

So the point of thunk is not to defer the calculation for the sake of deferring the calculation (since Rules already do that.)
Is is because we hope to be able to drop the calculation entirely, with-in the rule, because of mul with Zero.
(and theoretically other differential interactions but in practice probably only that one?).

oxinabox · 2019-08-07T18:44:13Z

So we need to Audit all uses of thunk, and get rid of them when they are not in a position to be zero'ed away

simeonschaub · 2019-08-07T23:12:32Z

Related to this, I think there also needs to be a recursive version of extern, which would strip all Thunks and Casted and returns either a scalar, an array or Wirtinger/array thereof.

oxinabox · 2019-08-08T01:22:14Z

Can you given an example of code that uses only thunks (not casted) that ends up in that state where you want recursive extern?

simeonschaub · 2019-08-10T09:10:50Z

Might have been in an earlier version, couldn't replicate it now

oxinabox · 2019-08-29T19:07:23Z

We definately need thunk

jrevels changed the title ~~Can we get rid of a bunch of subtypes of AbstractDifferentiable ?~~ Can we get rid Thunk? Aug 7, 2019

jrevels changed the title ~~Can we get rid Thunk?~~ Can we get rid of Thunk? Aug 7, 2019

oxinabox mentioned this issue Aug 7, 2019

extern(::Casted) seems ill-defined. (Remove Casted?) #10

Closed

oxinabox closed this as completed Aug 29, 2019

oxinabox mentioned this issue Sep 4, 2019

Overhaul Rules #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we get rid of `Thunk`? #18

Can we get rid of `Thunk`? #18

oxinabox commented Aug 7, 2019

jrevels commented Aug 7, 2019 •

edited

Loading

jrevels commented Aug 7, 2019 •

edited

Loading

oxinabox commented Aug 7, 2019 •

edited

Loading

oxinabox commented Aug 7, 2019

simeonschaub commented Aug 7, 2019

oxinabox commented Aug 8, 2019

simeonschaub commented Aug 10, 2019

oxinabox commented Aug 29, 2019

Can we get rid of Thunk? #18

Can we get rid of Thunk? #18

Comments

oxinabox commented Aug 7, 2019

Wirtinger

DNE

One, Zero

Casted

Thunk

jrevels commented Aug 7, 2019 • edited Loading

jrevels commented Aug 7, 2019 • edited Loading

oxinabox commented Aug 7, 2019 • edited Loading

oxinabox commented Aug 7, 2019

simeonschaub commented Aug 7, 2019

oxinabox commented Aug 8, 2019

simeonschaub commented Aug 10, 2019

oxinabox commented Aug 29, 2019

Can we get rid of `Thunk`? #18

Can we get rid of `Thunk`? #18

jrevels commented Aug 7, 2019 •

edited

Loading

jrevels commented Aug 7, 2019 •

edited

Loading

oxinabox commented Aug 7, 2019 •

edited

Loading