-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/issue 1062 ode speedup #1066
Conversation
…y having a copy of it on the outer stack
Bump. Once tests finish this can be reviewed and merged I think. |
@@ -72,7 +72,7 @@ struct coupled_ode_system<F, double, var> { | |||
: f_(f), | |||
y0_dbl_(y0), | |||
theta_(theta), | |||
theta_dbl_(value_of(theta)), | |||
theta_copy_(theta), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @wds15. This is not the right copy; we need to create a copy that is disconnected from theta
and is on our no_chain
stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test that there's nothing added to the stack on construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am using the nonstack thing now (really ugly to get things there!). However, I am not sure how to test and why to test that nothing gets on construction onto the stack. Ok, I could use ChainableStack::instance().var_stack.size()
... but that is a not an exported API and is used nowhere in any test which I could find such that I am not sure that this can be tested given our API. If you think it is important to test, can you suggest a scheme to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind... I figured that we can test the stack size changes if we nest things.
... this is one of the annoyances of our nesting system - things are slightly inconsistent wrt to the non-nested stack; at least to me.
So the coupled_ode_system is now tested to not put stuff on the AD stack upon construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're in the guts of the autodiff stack and you're playing with things that live globally and within the nested stack. You should be testing that what you're doing is consistent with what you think should be happening. Don't just assume the code you've written works... that will get us into a lot of trouble.
Also, since we're now dealing with threading and mpi and gpu and all sorts of other complications that could be unsafe, it makes a lot of sense to make sure these unit tests actually trigger failures if the basic assumptions of validity of autodiff stack through nested operations falls apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I think it's the doc that needs to be updated. But I think the changes are pretty good.
@@ -45,7 +45,7 @@ struct coupled_ode_system<F, double, var> { | |||
const F& f_; | |||
const std::vector<double>& y0_dbl_; | |||
const std::vector<var>& theta_; | |||
const std::vector<double> theta_dbl_; | |||
std::vector<var> theta_copy_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please rename this to something descriptive? This is meant to be a copy of the theta
, but placed on the global stack, but essentially treated as temporaries where we can reset the adjoints. I don't know what a good name is, but just calling it theta_copy_
really isn't that great.
The old name was ok because it implies we've lost the connections between theta_
and theta_dbl_
by demoting it to a double
.
@@ -45,7 +45,7 @@ struct coupled_ode_system<F, double, var> { | |||
const F& f_; | |||
const std::vector<double>& y0_dbl_; | |||
const std::vector<var>& theta_; | |||
const std::vector<double> theta_dbl_; | |||
std::vector<var> theta_copy_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For the top-level comment... I can't comment through the PR interface on lines of code that aren't really affected.)
Can you document what this implementation does? I think it'd help anyone else trying to maintain the code.
- that the
theta
is copied and put on the stack, but are not meant to be autodiffed - that there's use of nested autodiff to compute the sensitivities
msgs_(msgs) {} | ||
msgs_(msgs) { | ||
for (auto& p : theta) | ||
theta_copy_.emplace_back(var(new vari(value_of(p), false))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know p
is a stan::math::var
, why not use v.val()
directly? I think it'd be a little clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the value_of
is actually clearer as I don't need to recal the details of the var
class. Calling value_of
always gives me a double - no matter what. The val
method feels like an internal function whereas the value_of
is a clear API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to explain why I find this so much more confusing than if this was:
for (var&& p : theta)
theta_copy_.emplace_back(var(new vari(p.val(), false));
And this is coming from me having to read this code to figure it out.
value_of()
is a function that's designed to work ondouble
,var
,fvar<double>
,fvar<var>
, andfvar<fvar<var>>
. So I'm looking atvalue_of(p)
and trying to figure out... "what is p?"auto& p
instead ofvar&& p
doesn't help me at all. I know thatp
is one oftheta
, sotheta
is a collection, but what is it? Since I'm flipping between base template classes that can handle all autodiff types, it's not always easy to keep track that this is arev
only implementation. It'd really just help to have this bevar&& p
so the type is obvious.theta
... now I need to go and find where this is defined. C++ is a structured language, but it actually allows the declaration of struct and class member variables either at the top, below, or anywhere in between. And now I've got to search. I was looking at this using the GitHub interface, so the actual declaration was off of the interface; in order for me to go look at this, I had to go and check out the branch, open the file, all to find thattheta
is declared asstd::vector<var>
.
I think the thing that we're not seeing eye-to-eye on is the use of value_of
. I find it much easier to reason about something that's only valid for var
instead of having to work to figure out whether the object passed in is one of many different types. (It's just a little more mental work the reader has to do when it can all be made explicit as an alternative.)
(By the way, your statement is incorrect -- value_of
does not always give a double
. It'll give the type of the value
, which for fvar<var>
will be var
.)
Hopefully you've gotten some context on why this is harder to read for a reviewer or anyone else coming to the code later on. This is minor, but the idea still stands -- make it easier for the next person to read. It takes effort now, but it's so worth it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... value_of
is defined differently in my head... I always forget about higher order things. With the correct definitions the val
call is better, yeah. Sorry, but I am not used to meaning of value_of
.
Honestly, I do expect the reviewer to look at the file and not just the diff from git. It is really hard to expect our changes to be self-explanatory on a single diff line. The auto vs actual type ... I read the opposite elsewhere. Your third point I do not quite understand. I am looping over theta which is the argument to the constructor.
EDIT: The compiler only accepts
for (const var& p : theta)
So I went with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The third point wasn't actually there... you're right. I should have looked at the constructor signature, not the actual type of the variable.
I can hop in and complete the review. I'll do a review of the code and make sure all of @syclik 's requests have been addressed. |
I get the basic idea based on the PR description but I'm not sure what the glitch was here. This makes me worry I'm missing important subtleties. @wds15 maybe we can discuss over a quick video call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the queries from the previous review have been addressed, with a minor exception, see line 133 of coupled_ode_system.hpp
. The unit tests look fine, in particular they check that the size of the stack does not change. See my additional comments below.
@charlesm93 The idea is to have the parameters on a stack which is disconnected to overall AD tree. Such a place is the nonstack AD tape which I never knew before talking to @syclik about it (this explains the comment you quote). I added some comments as you requested in the code. Maybe we shortly discuss this at the meeting later in case you have more questions? |
…tan-dev/math into feature/issue-1062-ode-speedup
Ok, this all looks good to me. The only minor point missing is an explanatory comment on line 133 of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesm93, thanks for following up on this!
It just dawned on me that we could have nest a couple times. I believe this PR will add a copy of the variables on the stack when we could easily just remove them when we leave the function. Your call on whether that makes sense. Thanks again. (and thanks, @wds15, for putting this together!)
…work to straighten the decouple operation
…gs/RELEASE_500/final)
Ok. So here is an approach which can work and has the best of both approaches. The idea is:
Right now I address the 2. thing by doing the
If that makes sense to you, I would refactor accordingly. (I still think that the approach to leave things behind on the no chain outer AD stack is cleanest and the best compromise... but you feel obvious strongly against about that while I am strongly against modifying the outer AD tree given we head into a parallel world) |
I'm not strongly opposed to the approach of leaving things behind on the AD stack. It is a design decision and I think that's probably the cleanest and we should go with that. I think that should be the safest thing to do given the circumstances. In general, we shouldn't just leave things behind. If we do, we should clearly state what's going on and why we've done it this way. That's what I feel strongly about, not the fact that we do it. By questioning whether we can remove it, we've actually come a long way in the design discussion that we wouldn't have. Are you ok with going back to putting things on the global stack on construction? We can't remove them on destruction. That means the constructor and the functor are the functions that need to be locked, but the AD stack can be modified outside of those times without a problem. |
Great. I much prefer to go back to the design which uses the no chain approach (but leave things behind). I will revert the code and add some doc around that as to what is safe / what not / what the side-effect is. At least in my head having the guarantee that starting a nested AD tree gives me an independent sub-tree is a very good one (I think @bob-carpenter said this once if I am not wrong). I would expect that this holds right now in Stan-math - and we should keep it like this. |
On Apr 1, 2019, at 9:15 AM, wds15 ***@***.***> wrote:
Great. I much prefer to go back to the design which uses the no chain approach (but leave things behind).
I will revert the code and add some doc around that as to what is safe / what not / what the side-effect is.
At least in my head having the guarantee that starting a nested AD tree gives me an independent sub-tree is a very good one (I think @bob-carpenter said this once if I am not wrong).
That was certainly the intention.
I would expect that this holds right now in Stan-math - and we should keep it like this.
I agree.
|
…tan-dev/math into feature/issue-1062-ode-speedup
…tan-dev/math into feature/issue-1062-ode-speedup
I reverted to the make-nochain-copy approach described above which we agreed on. I feel much safer with this! The only side-effect of this approach is that instances of the parameter vector theta will be left on the nochain AD stack (even after the I added a note to the respective specialization which do this trick which explains what is done and what side-effects are. So this is ready for final review. Tagging @syclik and @charlesm93 who are well familiar with the context here. |
Here we seem to have the same as issue as in #1072 such that I suggest the same here: Either we can commit to a timeline, look for other reviewers or we drops this PR. Tagging again @seantalts and @bob-carpenter for suggestions on this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have a couple of questions. It'd be great to address them just to have some answers, but the PR is great!
@@ -44,6 +44,13 @@ namespace math { | |||
* method</a> as implemented in Boost's <code> | |||
* boost::numeric::odeint::runge_kutta_dopri5</code> integrator. | |||
* | |||
* During ODE integration the global autodiff tape is continuously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant by "the adjoints of the parameter vector are used for Jacobian calculations"? And how does that interact with concurrency?
@syclik : The confusing doc you found leading to your question
Is still a left-over from the old approach which used the outer AD tape. The approach which we implement now using the no chain stack is completely safe wrt to concurrency in the usual sense. I removed that comment there from the doc (which was also put into the cvodes_integrator file). Once passes tests this should then be fine to merge. Looks like there is once more the need for your approval. Sorry for that (I haven't actively dismissed your approval; probably that was automatic due to the change set). Thanks! (BTW, with the forthcoming independent AD in the parallel design docs we will be able to write this optimisation in a way such that nothing will be left behind, I think ... but that will still take some time until it lands, of course) |
Thanks for removing that comment -- I was scratching my head at that for a little bit, but I'm glad it was cleaned up. Yes, the way the approvals work now, it needs to be re-approved once a commit has been made. It's a pessimistic view of the world, but it's to prevent someone maliciously committing whatever they want after getting an approval and merging that. I think it's the right approach to take for an open-source project like ours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It makes sense to re-approve if there are changes...no objection to that. I just wasn't aware of it and always feel guilty to use peoples time if not actually needed. |
Whow! 6 month PR being around / 4 month under review or so / finally merged. Cool. I know this PR bends our usual conventions... but it is worth it and at least I learned a lot about our AD reverse mode things. Thanks @syclik for bearing with me on this one! |
@wds15 doesn't this imply that time-varying covariates (ie. thetas that change) won't work correctly? |
Ok, I found a way which I think is totally fine with our AD logic. We are still getting a nice 18% or so speedup on the sir example.
Summary
The intent of the PR is to speedup the ODE integration. The approach is to avoid repeated allocation of the parameters vector
theta_
on the nested AD stack. The nested AD is carried out when calling thecoupled_ode_system
functor. Since the parameters will never change during ODE integration we do not need to have these on the nested AD stack. The speedup of the SIR performance benchmark when running in 5 repetitions is 19%:Tests
Side Effects
Faster ODE integration.
Checklist
Math issue make
coupled_ode_system
more efficient #1062Copyright holder: Sebastian Weber
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested (no behavior as tested so far is changed)