Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make frule return a closure which returns the derivative #102

Closed
wants to merge 1 commit into from

Conversation

YingboMa
Copy link
Member

No description provided.

Co-authored-by: "Yingbo Ma" <mayingbo5@gmail.com>
Co-authored-by: "Shashi Gowda" <gowda@mit.edu>
@oxinabox
Copy link
Member

Why is this a good idea?

@YingboMa
Copy link
Member Author

YingboMa commented Jan 16, 2020

#67 (comment)

@shashi shashi force-pushed the ys/pushforward branch 2 times, most recently from 2a468ca to 5b57190 Compare January 16, 2020 22:53
@oxinabox oxinabox mentioned this pull request Jan 16, 2020
@JuliaDiff JuliaDiff deleted a comment from YingboMa Jan 16, 2020
@JuliaDiff JuliaDiff deleted a comment from YingboMa Jan 16, 2020
@JuliaDiff JuliaDiff deleted a comment from YingboMa Jan 16, 2020
@oxinabox
Copy link
Member

Should we close this now?
I think we have addressed a clear blocker of this strategy in that the original reason for fusing in #74 (comment)
of examples like the DE case where you actually want to solve for your primal and your partials in a single operation remain.
Infact they are stronger since now you want to get your primal and all partials of all orders in a single DE solve.

More broadly, differentiating the pushforward runs in the the same problems that recursive forward mode runs into in the first place.
That Taylor mode wants to avoid.
Which is the overlapping work that ideally should be shared between the nth, n+1th etc pushforward operations.
Even laying aside trivial cases like sin where the whole value gets reused,
one can look at Faa di Bruno's formula
and see that the nth order deriviative of f(g(x)) needs the a bunch of different combinations of intermeidate values that have already been computed when taking the n-1th derivative (or earlier).
So this approach robs us of that possible efficiency.

Here are the Faa di Bruno formula for derivative of f(g(x))

  • 0th: f(g(x))
  • 1st: g'(x) f'(g(x))
    - reuses g(x)
  • 2nd: g'(x)^2 f''(g(x)) + g''(x) f'(g(x))
    - reuses g(x), g'(x) and f'(g(x))
  • 3rd: f'''(g(x)) g'(x)^3 + 3 g'(x) g''(x) f''(g(x)) + g'''(x) f'(g(x)) - reuses: g(x), g'(x), g'(x)^2, g''(x), f''(g(x)) andf'(g(x))`

Basically, any function we care to write a frule for will itself call some functions.
And as such its frule will be implictly (or explictly) using a chainrule.
As so will have intermidate values corresponding to the varous g deriviatives that we get to reuse.
And thus we can't just generate the next level pushforwards via nested forward mode AD in the first place, since they won't have access to all the intermidate values we want to reuse.

@shashi
Copy link
Collaborator

shashi commented Jan 17, 2020

What's the alternative solution? Are you implying anyone defining an frule should also define the Nth order derivative and not just the first one??

With this, it would at least be possible to efficiently do symbolic nth derivatives (in @generated time) and do the required simplification and codegen?

@oxinabox
Copy link
Member

oxinabox commented Jan 17, 2020

What's the alternative solution? Are you implying anyone defining an frule should also define the Nth order derivative and not just the first one??

Yes, something like that.
I think this is probably core to doing Taylor mode in the first place.
We might want to call them trules or something. Not sure.
Then if they are not found can do the very effective use of polynomial numbers which will fairly efficiently work it out (just like the use of dual numbers does when there is no frule)
But that's for a discuson in #67

With this, it would at least be possible to efficiently do symbolic nth derivatives (in @generated time) and do the required simplification and codegen?

No more or less than it is to do it to frule.
Just like in option 1 of #74 (comment)
which needs to some how optimize to out shared intermidates from the primal
nested AD needs to optimize out shared intermidates from the n-1th nesting.
See Faa di Bruno formula.

@shashi
Copy link
Collaborator

shashi commented Jan 18, 2020

I strongly believe we shouldn't require frule to be defined for Nth order derivative, remember frule is for elementary functions. (For example, it would be too redundant to require it for log).

No more or less than it is to do it to frule.

I'm not a 100% sure about that yet.

@oxinabox
Copy link
Member

oxinabox commented Jan 18, 2020

I strongly believe we shouldn't require frule to be defined for Nth order derivative, remember frule is for elementary functions. (For example, it would be too redundant to require it for log).

Which is why I proposed a separate trule
since this discussion is not about this PR lets take it to the issue so it's easy to find #67

@willtebbutt
Copy link
Member

@oxinabox @YingboMa can we close this please?

@oxinabox oxinabox deleted the ys/pushforward branch January 11, 2021 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants