Overhaul Rules #30

oxinabox · 2019-08-27T18:21:19Z

This is a very mighty PR.

Adds a extra (currently alwaysDNE() as not functors) return value for all rrules, to represent the derviative w.r.t internals of closures/functors, and similar demands an extra input argument at the start of a call to frule (ignored for all current cases as not functors) Differentiating with respect to a function #22
the propergators (pushback/pullforward) now always return a tuple, even if it has just 1 element. Rules for unary functions, alway return tuple? #31
frule/rrule now return a 1 propagator (pushforward/pullback) that returms a tuple of partials, rather than 1 propagator per partial 1 AbstractRule Per Partial, vs 1 AbstractRule returning a tuple of Differentials (one per partial) #38
as a result AbstractRule subtypes are no longer used anywhere Remove Rule (or maybe all AbstractRules) and treat functions as Rules #39
@scalar_rule automatically names pullbacks/pushforwards. Below is what that looks like in Julia Master (with new improved display for gensymed names)

julia> _, pushforward = frule(sin, 1)
(0.8414709848078965, ChainRules.var"##75#sin_pushforward#55"{Int64}(1))

Does not look quiet as nice for 1.0 but still useful

julia> _, pushforward = frule(sin, 1)
(0.8414709848078965, getfield(ChainRules, Symbol("##75#sin_pushforward#55")){Int64}(1))

It has a corresponding PR to ChainRules.jl

JuliaDiff/ChainRules.jl#91

This is the main blocker for FluxML/Zygote.jl#291

oxinabox · 2019-08-28T18:17:00Z

All the accumulation stuff needs to be rewritten still.

oxinabox · 2019-09-02T18:02:42Z

I have not delted the AbstractRules yet, as I am yet to workout the story for store!
and for 2 arg rules that know how to update!.

I guess that will block that PR, but I will work that through as I finish JuliaDiff/ChainRules.jl#91
and need them.
Which I also want done before merging this.

Good the review this now though,
it is a big PR and the vast majority of changes are in.

simeonschaub · 2019-09-02T18:40:15Z

@oxinabox I've created a test case for mixed Wirtinger derivatives here. This should help making sure, that this works correctly.

oxinabox · 2019-09-02T18:51:25Z

Cool. I will add that to this PR tomorrow.
At least to check all the returned types are right.

simeonschaub

Just some small typos in the docstrings, that caught my eye

src/rule_definition_tools.jl

src/differentials.jl

test/runtests.jl

src/rule_definition_tools.jl

src/differentials.jl

nickrobinson251

Nice job! a bunch of pretty small comments:

look forward to reviewing once the other changes are in!

nickrobinson251 · 2019-09-03T09:27:20Z

src/rule_definition_tools.jl

@@ -59,6 +137,9 @@ For examples, see ChainRulesCore' `rules` directory.
 See also: [`frule`](@ref), [`rrule`](@ref), [`AbstractRule`](@ref)
 """
 macro scalar_rule(call, maybe_setup, partials...)
+    ############################################################################
+    # Setup: normalizing input form etc
+


Can this be broken up into functions? I'd love for this to not be 100 lines long...

i still feel the same

But if it is not easy / does not make sense to you, @oxinabox , then that's fine by me too

Wahooo! Thanks 🎉

test/rules.jl

src/rule_types.jl

src/rule_definition_tools.jl

src/differentials.jl

src/rule_definition_tools.jl

simeonschaub · 2019-09-03T11:09:08Z

src/differentials.jl

+
+####
+"""
+    differential(𝒟::Type, der)


Maybe this should take primal and conjugate as arguments, and depending on 𝒟 return either Wirtinger or their sum? I think that would make it more clear to rule authors, that when you create a Wirtinger, you usually also want this fall-through behavior.

It could also be useful to check, whether conjugate isa Zero here, and unwrap Wirtinger if that's the case.

Maybe this should take primal and conjugate as arguments, and depending on 𝒟 return either Wirtinger or their sum?

I'm not sure I understand the advantag of that?

With this we have
differential(𝒟, Wirtinger(primal, conjugatge))
which seems fine.
What is the advantage of
differential(𝒟, primal, conjugatge)
?

It could also be useful to check, whether conjugate isa Zero here, and unwrap Wirtinger if that's the case.

Maybe.
Maybe even iszero(conjugate) to to get constants like 0

Maybe even for inputs that are scalar:
iszero(der) && return Zero()
and similar for One()

We should discuss that kind thing in an issue and make a follow up PR for it

What is the advantage of
differential(𝒟, primal, conjugatge)
?

I just don't know if differential is the best name for this function, since it takes a differential and returns a, sometimes different and more efficient, differential again. I would expect a function called differential to work more like a constructor. Or do we maybe want to call this wirtinger and have it take 𝒟, primal, and conjugate, since this probably corresponds better to what it does right now? But I also wouldn't feel too strongly just leaving this for now, since I'm also struggling to find a better name for this function.

Maybe.
Maybe even iszero(conjugate) to to get constants like 0

Maybe even for inputs that are scalar:
iszero(der) && return Zero()
and similar for One()

I'm not quite convinced we benefit from introducing dispatch based on value here, wouldn't this also cause problem on GPUs? But this is definitely an issue for another day.

How about we rename it to refine_differential ?
I think this actually also interacts with #8 since we will want to apply something recursively.

src/differentials.jl

test/rules.jl

src/rule_definition_tools.jl

src/differentials.jl

src/rule_definition_tools.jl

tkf · 2019-09-04T06:10:20Z

frule/rrule now return a 1 propagator (pushforward/pullback) that returms a tuple of partials, rather than 1 propagator per partial 1 AbstractRule Per Partial, vs 1 AbstractRule returning a tuple of Differentials (one per partial) #38

I have a concern about this decision. Does it mean that, for example, the reverse rule of * becomes something like (computationally equivalent to)

function rrule(::typeof(*), A::AbstractMatrix{<:Real}, B::AbstractMatrix{<:Real})
    return A * B, Rule(Ȳ -> (Ȳ * B', A' * Ȳ))
end

(which is like what Zygote.jl is doing at the moment) instead of the current definition

function rrule(::typeof(*), A::AbstractMatrix{<:Real}, B::AbstractMatrix{<:Real})
    return A * B, (Rule(Ȳ -> Ȳ * B'), Rule(Ȳ -> A' * Ȳ))
end

where you can compute the derivative w.r.t different argument separately?

Wouldn't it be a huge performance loss when large constant arrays are participated in the computation of the intermediate variables that depend on the variables ("trainable parameters") with which the derivatives are taken? For example, in the Generative Adversarial Network (GAN) setting, I think it would be a big issue when taking derivative (d/dp) D(G(p)) w.r.t the parameter p of the generator G while treating the parameters of discriminator D as constant. I also noted the concerns in other similar situations here FluxML/Zygote.jl#323 (comment).

I'm by no means an AD or ML specialist so I may be missing something. It would be great if you can clarify that my concern is invalid.

oxinabox · 2019-09-04T09:28:59Z

Short answer: don't worry we solve this with Thunks.

Full answer:

@tkf a very reasonable concern. And one I used too have
until I understood why whe have the Thunk differential.
(#18)
We have an issue onpen about documenting that a bit more.
#46

... the current definition

function rrule(::typeof(*), A::AbstractMatrix{<:Real}, B::AbstractMatrix{<:Real})
    return A * B, (Rule(Ȳ -> Ȳ * B'), Rule(Ȳ -> A' * Ȳ))
end

becomes: In partner PR, this is one of the ones I've already updated

function rrule(::typeof(*), A::AbstractMatrix{<:Real}, B::AbstractMatrix{<:Real})
    return A * B, Ȳ -> (NO_FIELDS, @thunk(Ȳ * B'), @thunk(A' * Ȳ))
end

@thunk(Ȳ * B') is just a shorthand for Thunk(()->Ȳ * B')
Thunk source: definition, and math

That it basically is is a differential that defers computation until it is used.
So if it is never used then the wrapped computation is never computed
For examole:

Y, pullback = rrule(*, A, B)
_, dA_diff, dB_diff = pullback(One()
dB = extern(dB_diff)

then since dA_diff was never externed the Ȳ * B' is never evaluated.

tkf · 2019-09-04T14:34:02Z

@oxinabox Thanks a lot! I appreciate the full explanation. I should have checked the partner PR.

oxinabox · 2019-09-04T16:11:40Z

@oxinabox Thanks a lot! I appreciate the full explanation. I should have checked the partner PR.

It is a really important question

make real scalar rules work. correct @scalarrule forward rule return Wirtinger scalar working work WirtingerRule test as a test of @scalar_rule Fix spelling Co-Authored-By: simeonschaub <simeondavidschaub99@gmail.com> Oxford Comma Co-Authored-By: simeonschaub <simeondavidschaub99@gmail.com> spelling Co-Authored-By: Nick Robinson <npr251@gmail.com> docstring for propagator_name spelling Co-Authored-By: Nick Robinson <npr251@gmail.com>

error ratehr than Assert cleanup Update src/rule_definition_tools.jl Co-Authored-By: Nick Robinson <npr251@gmail.com> Add more complex Wirtinger Scalar Rule Test

update accumulate to work on differentials

Co-Authored-By: Curtis Vogt <curtis.vogt@gmail.com>

spelling is hard

zero the storage inplace

This reverts commit 85b5bf9.

oxinabox · 2019-09-17T10:16:29Z

Rebased, and squashed some of them.
Going to try shuffling the commits to squash it down some more.

Normally I am hesitant to squash during PR review but this has had a lot of review so far,
so making each commit distinct in purpose seem apppropriate now

oxinabox · 2019-09-17T10:23:50Z

All tests (except inegration tests) are passing.

Shuffle rebasing is hard, not sure if worth it
Might squash thing into a single commit at the end?

simeonschaub

LGTM!

nickrobinson251

This LGTM

Good work!

I have felt handful of tiny comment :)

nickrobinson251 · 2019-09-17T13:08:08Z

src/differentials.jl

+- The expression wrapping something in a `struct`, such as `Adjoint(x)` or `Diagonal(x)`
+- The expression being a constant
+- The expression being itself a `thunk`
+- The expression being from another `rrule` or `frule` (it would be `@thunk`ed if required by the defining rule already)


This entire section is great

Shall we move it to a page in the docs?

nickrobinson251 · 2019-09-17T13:19:23Z

src/differentials.jl

+(Otherwise one can just use a normal `Thunk`).
+
+Most operations on an `InplaceableThunk` treat it just like a normal `Thunk`;
+and destroy its inplacability.


Suggested change

and destroy its inplacability.

and destroy its ability to work inplace.

nickrobinson251 · 2019-09-17T13:20:19Z

src/differentials.jl

-Base.conj(x::Thunk) = @thunk(conj(extern(x)))
+# The real reason we have this:
+accumulate!(Δ, ∂::InplaceableThunk) = ∂.add!(Δ)
+store!(Δ, ∂::InplaceableThunk) = ∂.add!((Δ.*=false))  # zero it, then add to it.


Suggested change

store!(Δ, ∂::InplaceableThunk) = ∂.add!((Δ.*=false)) # zero it, then add to it.

store!(Δ, ∂::InplaceableThunk) = ∂.add!((Δ .*= false)) # zero it, then add to it.

nickrobinson251 · 2019-09-17T13:21:27Z

src/operations.jl

+Similar to [`accumulate`](@ref), but attempts to compute `Δ + rule(args...)` in-place,
+storing the result in `Δ`.
+
+Note: this function may not actually store the result in `Δ` if `Δ` is immutable,


Suggested change

Note: this function may not actually store the result in `Δ` if `Δ` is immutable,

!!! note

this function may not actually store the result in `Δ` if `Δ` is immutable,

nickrobinson251 · 2019-09-17T13:21:56Z

src/operations.jl

+"""
+    store!(Δ, ∂)
+
+Stores `∂`, in `Δ`, overwriting what ever was in `Δ` before.


Suggested change

Stores `∂`, in `Δ`, overwriting what ever was in `Δ` before.

Stores `∂` in `Δ` overwriting whatever was in `Δ` before.

nickrobinson251 · 2019-09-17T13:23:01Z

src/rule_definition_tools.jl

+Returns the expression for the propagation of
+the input gradient `Δs` though the partials `∂s`.
+
+𝒟 is an expression that when evaluated returns the type-of the input domain.


Suggested change

𝒟 is an expression that when evaluated returns the type-of the input domain.

𝒟 is an expression that when evaluated returns the type of the input domain.

nickrobinson251 · 2019-09-17T13:23:29Z

src/rule_definition_tools.jl

+function standard_propagation_expr(Δs, ∂s)
+    # This is basically Δs ⋅ ∂s
+
+    # Notice: the thunking of `∂s[i] (potentially) saves us some computation


Suggested change

# Notice: the thunking of `∂s[i] (potentially) saves us some computation

# Notice: the thunking of `∂s[i]` (potentially) saves us some computation

nickrobinson251 · 2019-09-17T13:24:09Z

src/rule_definition_tools.jl

+    # Notice: the thunking of `∂s[i] (potentially) saves us some computation
+    # if `Δs[i]` is a `AbstractDifferential` otherwise it is computed as soon
+    # as the pullback is evaluated
+    ∂_mul_Δs = [:(@thunk($(∂s[i])) * $(Δs[i])) for i in 1:length(∂s)]


Yes :) Is it worth opening an issue (to stare hard at this and figure out if all is well)?

nickrobinson251 · 2019-09-17T13:27:24Z

src/rule_definition_tools.jl

@@ -59,6 +137,9 @@ For examples, see ChainRulesCore' `rules` directory.
 See also: [`frule`](@ref), [`rrule`](@ref), [`AbstractRule`](@ref)
 """
 macro scalar_rule(call, maybe_setup, partials...)
+    ############################################################################
+    # Setup: normalizing input form etc
+


i still feel the same

But if it is not easy / does not make sense to you, @oxinabox , then that's fine by me too

src/rules.jl

Co-Authored-By: Nick Robinson <npr251@gmail.com>

oxinabox changed the title ~~WIP Derivative wrt function~~ WIP: Derivative wrt function Aug 27, 2019

oxinabox mentioned this pull request Aug 27, 2019

Overhaul Rules (partner PR) JuliaDiff/ChainRules.jl#91

Merged

oxinabox force-pushed the ox/wrtfunction branch from 061197f to e11c001 Compare August 28, 2019 17:32

oxinabox force-pushed the ox/wrtfunction branch from e238a8d to e061404 Compare August 30, 2019 15:50

oxinabox changed the title ~~WIP: Derivative wrt function~~ Overhaul Rules Sep 2, 2019

simeonschaub reviewed Sep 2, 2019

View reviewed changes

oxinabox commented Sep 2, 2019

View reviewed changes

src/rule_definition_tools.jl Outdated Show resolved Hide resolved

oxinabox commented Sep 2, 2019

View reviewed changes

src/rule_definition_tools.jl Outdated Show resolved Hide resolved

oxinabox commented Sep 2, 2019

View reviewed changes

src/rule_definition_tools.jl Outdated Show resolved Hide resolved

oxinabox commented Sep 2, 2019

View reviewed changes

src/differentials.jl Outdated Show resolved Hide resolved

nickrobinson251 reviewed Sep 3, 2019

View reviewed changes

simeonschaub reviewed Sep 3, 2019

View reviewed changes

src/differentials.jl Outdated Show resolved Hide resolved

simeonschaub reviewed Sep 3, 2019

View reviewed changes

test/rules.jl Outdated Show resolved Hide resolved

oxinabox assigned mattBrzezinski Sep 3, 2019

oxinabox mentioned this pull request Sep 3, 2019

Write docs on how to write rules to run different computations at different times #46

Open

omus reviewed Sep 3, 2019

View reviewed changes

src/rule_definition_tools.jl Outdated Show resolved Hide resolved

src/differentials.jl Outdated Show resolved Hide resolved

src/differentials.jl Outdated Show resolved Hide resolved

src/rule_definition_tools.jl Outdated Show resolved Hide resolved

willtebbutt unassigned mattBrzezinski Sep 4, 2019

tkf mentioned this pull request Sep 4, 2019

Slow backward pass when the forward pass touches a large array FluxML/Zygote.jl#323

Open

oxinabox requested a review from mattBrzezinski September 4, 2019 16:11

This was referenced Sep 5, 2019

Make extern recursive? #47

Closed

make extern(::Thunk) recursive #48

Merged

oxinabox and others added 17 commits September 17, 2019 11:10

Update test/rules.jl

eb3c292

error ratehr than Assert cleanup Update src/rule_definition_tools.jl Co-Authored-By: Nick Robinson <npr251@gmail.com> Add more complex Wirtinger Scalar Rule Test

Make accumulate apply to Differentials

f6979ac

update accumulate to work on differentials

Update src/rule_definition_tools.jl

55dcefe

Co-Authored-By: Curtis Vogt <curtis.vogt@gmail.com>

Make frule return a scalar

53d5f9e

add docs about when to thunk

5b6c0d8

fix up for recursive extern

5c3fdaa

correct accumulate to be broadcasting

f14e045

Get rid of all Rule types, add InplacableThunk

ef748c9

spelling is hard

overload store!

48a0391

zero the storage inplace

much AbstractThunk

752732e

rename InplaceableThunk InplaceThunk

765ecfc

remove reference to Rules from docstrings

62d3be3

update thunk docstring

3ae6449

fix indent in docstring

03cb994

Revert "rename InplaceableThunk InplaceThunk"

cb01743

This reverts commit 85b5bf9.

rename differential to refine_differential

3656389

oxinabox force-pushed the ox/wrtfunction branch from 5995297 to 3656389 Compare September 17, 2019 10:14

simeonschaub approved these changes Sep 17, 2019

View reviewed changes

nickrobinson251 approved these changes Sep 17, 2019

View reviewed changes

oxinabox mentioned this pull request Sep 17, 2019

Some thoughts on Wirtinger #40

Open

oxinabox and others added 2 commits September 17, 2019 16:55

Update src/rules.jl

1869be1

Co-Authored-By: Nick Robinson <npr251@gmail.com>

split up scalar_rule into a bunch of functions

e51ff80

oxinabox force-pushed the ox/wrtfunction branch from 2bd9b6c to e51ff80 Compare September 17, 2019 17:45

oxinabox merged commit a133468 into master Sep 17, 2019

nickrobinson251 mentioned this pull request Sep 17, 2019

Rename DNE -> DoesNotExist #42

Merged

YingboMa deleted the ox/wrtfunction branch December 24, 2019 19:26

nickrobinson251 mentioned this pull request Jan 4, 2020

Make tests independent of Turing TuringLang/DynamicPPL.jl#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul Rules #30

Overhaul Rules #30

oxinabox commented Aug 27, 2019 •

edited

Loading

oxinabox commented Aug 28, 2019

oxinabox commented Sep 2, 2019 •

edited

Loading

simeonschaub commented Sep 2, 2019

oxinabox commented Sep 2, 2019

simeonschaub left a comment

nickrobinson251 left a comment

nickrobinson251 Sep 3, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

simeonschaub Sep 3, 2019

simeonschaub Sep 3, 2019

oxinabox Sep 10, 2019

simeonschaub Sep 11, 2019

oxinabox Sep 17, 2019

tkf commented Sep 4, 2019

oxinabox commented Sep 4, 2019

tkf commented Sep 4, 2019

oxinabox commented Sep 4, 2019

oxinabox commented Sep 17, 2019

oxinabox commented Sep 17, 2019 •

edited

Loading

simeonschaub left a comment

nickrobinson251 left a comment

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

nickrobinson251 Sep 17, 2019

	and destroy its inplacability.
	and destroy its ability to work inplace.

	store!(Δ, ∂::InplaceableThunk) = ∂.add!((Δ.*=false)) # zero it, then add to it.
	store!(Δ, ∂::InplaceableThunk) = ∂.add!((Δ .*= false)) # zero it, then add to it.

	Note: this function may not actually store the result in `Δ` if `Δ` is immutable,
	!!! note
	this function may not actually store the result in `Δ` if `Δ` is immutable,

	Stores `∂`, in `Δ`, overwriting what ever was in `Δ` before.
	Stores `∂` in `Δ` overwriting whatever was in `Δ` before.

	𝒟 is an expression that when evaluated returns the type-of the input domain.
	𝒟 is an expression that when evaluated returns the type of the input domain.

	# Notice: the thunking of `∂s[i] (potentially) saves us some computation
	# Notice: the thunking of `∂s[i]` (potentially) saves us some computation

Overhaul Rules #30

Overhaul Rules #30

Conversation

oxinabox commented Aug 27, 2019 • edited Loading

oxinabox commented Aug 28, 2019

oxinabox commented Sep 2, 2019 • edited Loading

simeonschaub commented Sep 2, 2019

oxinabox commented Sep 2, 2019

simeonschaub left a comment

Choose a reason for hiding this comment

nickrobinson251 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkf commented Sep 4, 2019

oxinabox commented Sep 4, 2019

Full answer:

tkf commented Sep 4, 2019

oxinabox commented Sep 4, 2019

oxinabox commented Sep 17, 2019

oxinabox commented Sep 17, 2019 • edited Loading

simeonschaub left a comment

Choose a reason for hiding this comment

nickrobinson251 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox commented Aug 27, 2019 •

edited

Loading

oxinabox commented Sep 2, 2019 •

edited

Loading

oxinabox commented Sep 17, 2019 •

edited

Loading