-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we get rid of Thunk
?
#18
Comments
#10 is already this issue for Long story short, the feature that is implemented via Imagine a simple forward-mode AD using ChainRules. Let's say it is getting the
Let's say our AD is only differentiating w.r.t. Note that this type of situation isn't that rare at the very entry point of the target code (when the user explicitly says "hey I only care about this partial derivative for my function"). More importantly, I'd say it's downright common to encounter this implicitly when you're actually computing intermediate derivatives within the target code being differentiated. Note also that there is an exactly equivalent situation for reverse-mode (just w.r.t. outputs instead of inputs). So here's the cool part: ideally, just given partial derivatives, ChainRules would be able to autogenerate rules for you where Hopefully all that helps; the thunk stuff is one of the many things that never got properly documented, since I never got around to writing an actual manual as opposed to docstrings. |
Thunk
?
(just to clarify, since this is the crux of my earlier example and I didn't make it explicit: |
right Thanks that clears things up So the point of thunk is not to defer the calculation for the sake of deferring the calculation (since |
So we need to Audit all uses of thunk, and get rid of them when they are not in a position to be zero'ed away |
Related to this, I think there also needs to be a recursive version of |
Can you given an example of code that uses only thunks (not casted) that ends up in that state where you want recursive extern? |
Might have been in an earlier version, couldn't replicate it now |
We definately need thunk |
@jrevels: @willtebbutt and I were going through the Differentials to make sure we actually know what they are for.
And started to wonder if we need them.
Each one we get rid of simplifies things a lot,
especially when it comes to #16
I think we might be able to just have
Wirtinger
,One
,Zero
, andDNE
.Wirtinger
I have only the barest understanding of what this is.
It effectively seems like a particularly convient way to deal with
deriviatives with respect complex number (In contrast to handling them as structs to (#4))
Probably useful.
DNE
Does not exist. Obviously useful.
One, Zero
Useful identities that are evaluated lazily, and can thus be removed from the chain efficiently.
Casted
It is kind of the generalization of One, and Zero.
(in that
One()
could also be writtenCasted(true)
etc).It lets us lazily delay computing a broadcast,
so that it can be fused later.
But I think in the short term we can simplify the code
by replacing say
Rule((Δx, Δy) -> sum(Δx * cast(y)) + sum(cast(x) * Δy))
with
Rule((Δx, Δy) -> sum(Δx .*y) + sum(x .* Δy)
(from here
which for that particular case would even be identical in performance I think.
Since it does not end up returning any kind of lazy computation.
And later we can try getting back the lazy computation and broadcast fusing by returning
broadcasted
.Getting rid of Casted would solve #10
Thunk
Thunk seemed really useful at first,
but I am not sure anymore that it actually does anything.
A thunk is basically wrapping a function returning Differentiable
f(v)
in a()->f(v)
so as not to have to compute it yet.
But Any time you interact with it (e.g. via
add
ormul
) it getsextern
ed,because if you don't do that you can get huge chains of thunks that call thunks,
and also because at the time you are called e.g.
add
you probably do actually want the value -- your not going to skip it and only use the other part.And the using it inside a rule isn't actually making anything extra deferred until the backwards pass, since rules themselfs are deffered until backward pass.
E.g. lookinng at this rule
Rather than
we could just do
Which boils down to the same thing since it when the rule is invoked it gets
extern
ed anyway. by*
becomingmul
.Even in the case of the derivative for multiple things, so you would have multiple rules referencing the
thunk
, it still doesn't change anything since thunks don't cache(#7).
I recall @jrevels saying that they used to cache, so maybe still having them is a legacy of that time and we just didn't notice that they didn't do anything anymore.
The text was updated successfully, but these errors were encountered: