Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds auxilary functions needed for reduce_sum #1800

Merged
merged 13 commits into from
Mar 31, 2020

Conversation

SteveBronder
Copy link
Collaborator

Summary

Adds the following functions needed for reduce_sum

  1. accumulate_adjoints: Iterates through a parameter pack and assigns any var object or containers of vars adjoint values into a contiguous block of memory.
  2. count_vars: Iterates over a parameter pack counting the number of var objects as well as vars in containers
  3. deep_copy_vars: Iterates over a parameter pack while only copying the var objects and var containers.
  4. save_varis Iterates over a paramater pack of var objects and containers storing the vari pointers into a contiguous block of memory

Tests

Tests can be run with

./runTests.py test/unit/math/rev/core/accumulate_adjoints_test.cpp \
test/unit/math/rev/core/count_vars_test.cpp \
test/unit/math/rev/core/deep_copy_vars_test.cpp \
test/unit/math/rev/core/save_varis_test.cpp

Side Effects

None

Release notes

Checklist

  • Math issue [WIP] Parallel Prototype #1616

  • Copyright holder: Steve Bronder, Ben Bales, Sebastian Weber

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@bbbales2
Copy link
Member

Ooops, missed this. Thanks for the ping! I'll get on it tonight!

@bbbales2
Copy link
Member

Oh wait I just wrote this code and the tests lol. @wds15 can you do the review?

@wds15
Copy link
Contributor

wds15 commented Mar 25, 2020

Ok...I should be able to review.

@bbbales2
Copy link
Member

@wds15 I should probably be the one to finish up the reduce_sum tests, Steve can manage the pull, and you can review?

I'll try to get that done before the Stan meeting tomorrow. I didn't see any sign of a Math meeting.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.84 4.83 1.0 0.18% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -3.76% slower
eight_schools/eight_schools.stan 0.09 0.09 0.97 -3.56% slower
gp_regr/gp_regr.stan 0.22 0.22 1.03 2.5% faster
irt_2pl/irt_2pl.stan 6.47 6.48 1.0 -0.21% slower
performance.compilation 89.26 86.44 1.03 3.17% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.53 7.53 1.0 -0.09% slower
pkpd/one_comp_mm_elim_abs.stan 21.98 21.16 1.04 3.7% faster
sir/sir.stan 90.86 92.45 0.98 -1.75% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.04 3.46% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.99 0.99 -1.2% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.31 1.04 4.14% faster
arK/arK.stan 1.73 1.74 0.99 -0.95% slower
arma/arma.stan 0.66 0.66 0.99 -1.02% slower
garch/garch.stan 0.51 0.51 1.0 -0.41% slower
Mean result: 1.0034243219

Jenkins Console Log
Blue Ocean
Commit hash: 6dedcf9


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.8 4.88 0.98 -1.7% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.02 1.63% faster
eight_schools/eight_schools.stan 0.09 0.09 1.01 1.39% faster
gp_regr/gp_regr.stan 0.22 0.22 0.99 -0.55% slower
irt_2pl/irt_2pl.stan 6.43 6.43 1.0 -0.0% slower
performance.compilation 87.99 86.62 1.02 1.56% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.57 7.54 1.0 0.38% faster
pkpd/one_comp_mm_elim_abs.stan 20.95 21.4 0.98 -2.12% slower
sir/sir.stan 90.9 95.61 0.95 -5.18% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.95 -5.4% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 -0.07% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.0 -0.07% slower
arK/arK.stan 1.74 1.74 1.0 -0.17% slower
arma/arma.stan 0.67 0.66 1.02 2.01% faster
garch/garch.stan 0.52 0.52 1.0 -0.33% slower
Mean result: 0.994726669338

Jenkins Console Log
Blue Ocean
Commit hash: 6dedcf9


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Contributor

@wds15 wds15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some revision according to comments... lemme know if you need more details. Quite amazing code, which will be super-useful for more "packed" functions.

storage.setZero();
double* ptr = stan::math::accumulate_adjoints(storage.data(), arg);

size_t num_vars = stan::math::count_vars(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sense of a unit test, I would prefer it to not use count_vars here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment to the other instances.

size_t num_vars = stan::math::count_vars(arg);

for (int i = 0; i < num_vars; ++i)
EXPECT_FLOAT_EQ(storage(i), i + 1.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation should just be 1 here, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! in this loop i only ever equals zero so I fixed that up here and in the other tests where that happens

EXPECT_FLOAT_EQ(storage(i), 0.0);

EXPECT_EQ(ptr, storage.data() + num_vars);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to finish every test with recover_memory() in order to wipe the AD tape. This will be important if we later do some big testing binaries with all tests in one file.

EXPECT_FLOAT_EQ(storage(i), 0.0);

EXPECT_EQ(ptr, storage.data() + num_vars);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u add a test for the accumulate_adjoints function which terminates the recursion we use for the parameter packs? Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you just mean checking the end of the recursion method like

TEST(AgradRev_accumulate_adjoints, zero_args) {
  Eigen::VectorXd storage = Eigen::VectorXd::Zero(1000);
  double* ptr = stan::math::accumulate_adjoints(storage.data());

  for (int i = 0; i < storage.size(); ++i) {
    EXPECT_FLOAT_EQ(storage(i), 0.0);
  }

  EXPECT_EQ(ptr, storage.data());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me. We want to make sure it returns the pointer it got, and at least hope it didn't do anything to the data there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

decltype(stan::math::deep_copy_vars(arg)) out
= stan::math::deep_copy_vars(arg);

for (int i = 0; i < arg.size(); ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are actually not sufficient if we get the deep copies as we need them. The point about deep copies is that the vars are not linked to one another. To test this, I would suggest to make a test which follows how we use this. This is, we have some vars, then start a nested AD region, there we do a deep copy, call grad, check grads are fine (adjoints are non-zero), then tear it down, and then check that the adjoints of the outer vars are still zero. That is why we do this function and this is what we should test it for, I think. Does what I say make sense to you?

namespace stan {
namespace math {

template <typename... Pargs>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add somewhere a hint as to how the "..." processing works? I am not sure if we need to repeat this for all utilities, but I would appreciate a short explanation of the "peel off" logic for the ... argument pack. I don't know where to put this, as this is something which is not specific to a given function... maybe to the function which ends the recursion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include a comment on the last one with a link to an article explaining variadic parameter packs, how do you feel about that? We could also have a module for reduce_sum so on the site there's a section that explains the more complex parts of the reduce sum implimentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By module I mean a section like we have for the requires stuff

http://mc-stan.org/math/d1/db9/group__require__meta.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great to me to have such a module.

Eigen::RowVectorXd arg = Eigen::RowVectorXd::Ones(5);

decltype(stan::math::deep_copy_vars(arg)) out
= stan::math::deep_copy_vars(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests don't look right for save_varis.... maybe some meshup happened?

1 + 5 + 3 + 4 + 15 + 2 * 5 + 2 * 3 + 2 * 4 + 30,
count_vars(arg1, arg18, arg17, arg2, arg16, arg3, arg15, arg4, arg14,
arg5, arg13, arg12, arg6, arg11, arg7, arg10, arg8, arg9));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well, we should also test the case of no arguments returning 0.. so count_vars() == 0, right?

@SteveBronder
Copy link
Collaborator Author

@bbbales2 are you handling the tests here or should I go at the above?

@bbbales2
Copy link
Member

@SteveBronder I have a plan for deep copy.

The other things have at em'. There should be save_varis test in the reduce_sum WIP pull.

@SteveBronder
Copy link
Collaborator Author

@bbbales2 do you think it makes sense to do accumulate_adjoints_impl like we do for count_varis then just have one top level function accumulate_adjoints exposed?

@bbbales2
Copy link
Member

do you think it makes sense to do accumulate_adjoints_impl

I guess I have no opinion really. Maybe it makes docs clearer cause then there's only one entry point? Either way is good.

@bbbales2
Copy link
Member

I made the changes to the deep_copy_vars tests. Hopefully they're good now.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.82 4.83 1.0 -0.25% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 0.99% faster
eight_schools/eight_schools.stan 0.09 0.09 0.99 -0.87% slower
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.19% faster
irt_2pl/irt_2pl.stan 6.45 6.44 1.0 0.2% faster
performance.compilation 87.58 86.66 1.01 1.05% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.57 7.54 1.0 0.3% faster
pkpd/one_comp_mm_elim_abs.stan 20.22 21.19 0.95 -4.78% slower
sir/sir.stan 90.73 91.04 1.0 -0.34% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.97 -3.56% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.96 1.0 -0.18% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.0 -0.39% slower
arK/arK.stan 1.73 1.75 0.99 -0.9% slower
arma/arma.stan 0.65 0.66 0.99 -1.09% slower
garch/garch.stan 0.51 0.51 1.0 0.0% slower
Mean result: 0.993848814474

Jenkins Console Log
Blue Ocean
Commit hash: 32ef30f


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder requested review from wds15 and removed request for bbbales2 March 28, 2020 17:43
@SteveBronder
Copy link
Collaborator Author

@wds15 this is ready for review!

Copy link
Contributor

@wds15 wds15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor stuff found... this is close to being ready. Thanks.

stan/math/rev/core/deep_copy_vars.hpp Show resolved Hide resolved
namespace stan {
namespace math {

template <typename... Pargs>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great to me to have such a module.

using stan::math::var;
using stan::math::vari;

TEST(AgradRev_save_varis, int_arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the recursion ending function is not tested for save_varis. Can u add that please?


template <typename VecVar, require_std_vector_vt<is_var, VecVar>* = nullptr,
typename... Pargs>
inline size_t count_vars_impl(size_t count, VecVar&& x, Pargs&&... args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for bringing this up somewhat late.... but shouldn't we place the "*_impl" stuff into an "internal" namespace? This applies to the other tools as well which we are writing with an impl naming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... count_vars is the only one with a "impl" thing - for good reasons as here you have to initialise to zero. So, I think moving the "*_impl" functions into an internal namespace, makes sense to me. Right?

@bbbales2
Copy link
Member

Gonna make these changes real quick

Moved count_vars_impl to internal namespace

Added empty tuple test for apply (for reduce_sum, design-doc #17)
@bbbales2
Copy link
Member

@wds15 Ready to check again (if tests pass)

Also what was "Sounds great to me to have such a module." pointing at?

@wds15
Copy link
Contributor

wds15 commented Mar 30, 2020

Steve suggested to add some documentation on the tuples argument processing magic as documentation to the modules doxygen section.

@wds15
Copy link
Contributor

wds15 commented Mar 31, 2020

Looks like we have issue with pow??? Stan unit tests are failing with:

In file included from src/test/unit/services/sample/hmc_static_unit_e_adapt_test.cpp:4:
./test/test-models/good/optimization/rosenbrock.hpp:177:31: error: call to 'pow' is ambiguous
            lp_accum__.add(-((pow((1 - x), 2) + (100 * pow((y - pow(x, 2)), 2)))));
                              ^~~
src/stan/services/util/initialize.hpp:127:33: note: in instantiation of function template specialization 'rosenbrock_model_namespace::rosenbrock_model::log_prob<false, true, double>' requested here
      log_prob = model.template log_prob<false, Jacobian>(unconstrained,
                                ^
src/stan/services/sample/hmc_static_unit_e_adapt.hpp:62:43: note: in instantiation of function template specialization 'stan::services::util::initialize<true, rosenbrock_model_namespace::rosenbrock_model, boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> > >' requested here
  std::vector<double> cont_vector = util::initialize(
                                          ^

@wds15
Copy link
Contributor

wds15 commented Mar 31, 2020

Same issue here: #1811

Maybe something odd was merged to develop or there are some issues with the Jenkins pipelines... no idea. @rok-cesnovar do you have a hint? Not sure who else to ping.

@rok-cesnovar
Copy link
Member

Yes, this is the same issue that happened in #1804. Try restarting tests and hope the Linux machine does not get picked.

See discussion #1804 (comment)

Ben is on it, I was hoping #1810 would fix it, but it apparently needs some more work. My opinion is that we give Ben some more time to fix it, otherwise we revert.

@wds15
Copy link
Contributor

wds15 commented Mar 31, 2020

Thanks. I wasn't aware of this mess. Looks ugly, but let's see. Is the complex stuff what is causing the hiccup here?

@rok-cesnovar
Copy link
Member

Yes, its from the complex PR. It fails on a specific Linux AWS EC2 machine. If that test runs elsewhere it's fine. Its an issue with overloads and standard libraries I believe.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.88 4.93 0.99 -0.9% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.13% slower
eight_schools/eight_schools.stan 0.09 0.09 0.96 -4.04% slower
gp_regr/gp_regr.stan 0.22 0.22 1.01 0.64% faster
irt_2pl/irt_2pl.stan 6.47 6.44 1.01 0.51% faster
performance.compilation 87.92 87.04 1.01 1.01% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.52 7.53 1.0 -0.06% slower
pkpd/one_comp_mm_elim_abs.stan 20.83 21.47 0.97 -3.07% slower
sir/sir.stan 93.71 93.75 1.0 -0.04% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 0.11% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.95 1.0 0.37% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.32 0.94 -5.82% slower
arK/arK.stan 1.9 1.72 1.1 9.46% faster
arma/arma.stan 0.66 0.67 0.98 -1.77% slower
garch/garch.stan 0.52 0.51 1.01 1.28% faster
Mean result: 0.996908354863

Jenkins Console Log
Blue Ocean
Commit hash: 58a7626


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants