-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized unbounded continuous random number generators (Issue 45) #622
Vectorized unbounded continuous random number generators (Issue 45) #622
Conversation
…3 parameter rngs (Issue 45) Vectorized all unbounded continuous random number generators and added tests (Issue 45) Changed how skew_normal random numbers are generated (following https://arxiv.org/pdf/0911.2342.pdf) (Issue 527)
Poor ol' Travis. Package manager failed for some reason it seems. |
This is failing in unit testing because a new signed comparison warning cropped up in the test code. Easy fix to change an |
Bah, I use clang++ for the nice errors, but I should probably just switch to g++ cause it seems to be more careful about these things. Tnx for the heads up. |
I think this is ready to review. Jenkins failed on some upstream tests for some mystery reason (edit: but the Math tests passed):
Could easily ask for a retest, but that'll burn another pile of hours on Jenkins for nothing. (no way this gets through without revisions anyway) |
Sorry, the linux node has been tricksy so typically we'll just retest when this happens. I'm working on getting it to be more friendly. Bob, are you reviewing this or should someone else (me?)? |
I just made it through the Hobbit again. Fantastic book. I'm totally into a Golem rebranding of Jenkins. Maybe if we wanted a rebuild we'd haves to answers one of precious's riddles? |
Jenkins retest this please |
1 similar comment
Jenkins retest this please |
…torized_rngs_unbounded_continuous
Nice! Tests passed with the gcc-4.9 patch in place. This is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Overall, this looks really good but I'm asking for a bunch of code cleanup. See inline comments.
I marked which changes were optional but do ask if it's not clear.
double sigma, | ||
template <typename T_loc, typename T_scale, class RNG> | ||
inline typename VectorBuilder<true, double, T_loc, T_scale>::type | ||
cauchy_rng(const T_loc &mu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpplint thing isn't very strict. We've been putting&
after type, not before variable, as in T_loc& mu
. I'd rather keep our code base consistent. Also, put as many arguments on a line as will fit and generally try to conserve vertical space as much as possible.
variate_generator<RNG&, uniform_01<> > rng_unit_01(rng, uniform_01<>()); | ||
double a = 0; | ||
double laplaceRN = rng_unit_01(); | ||
if (0.5 - laplaceRN > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, you want to declare variables before you use them and only define them once if possible. So this should be just
double a = rng_unit_01() < 0.5 ? 1 : -1;
Then you could just replace a
with the above. And you'd want to move the declaration and definition of rng_unit_01
out of the loop for efficiency
Having said all that, I think it'll be easier to just write a function that does bernoulli_rng
without all the checks. This is too much logic in this function to generate random bernoulli variates.
VectorBuilder<true, double, T_loc, T_scale, T_inv_scale> output(N); | ||
|
||
for (size_t n = 0; n < N; n++) { | ||
output[n] = normal_rng(mu_vec[n], sigma_vec[n], rng) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't going to be too much of an efficiency issue, but generally you don't want to call your public-facing functions internally because they're going to be doing redundant argument checks. It'll wind up creating more objects, too. Overally, though, this is probably OK like this and not worth optimizing now.
VectorBuilder<true, double, T_loc, T_scale> output(N); | ||
|
||
for (size_t n = 0; n < N; n++) { | ||
variate_generator<RNG&, uniform_01<> > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull the rng out of the loop for efficiency.
VectorBuilder<true, double, T_loc, T_scale> output(N); | ||
|
||
for (size_t n = 0; n < N; n++) { | ||
variate_generator<RNG&, exponential_distribution<> > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rng out of loop. you should be doing this for every rng that doesn't depend on the loop parameters. it saves a lot of object construction and destruction and still uses the same RNG so the result will the be same
#include <limits> | ||
#include <vector> | ||
|
||
using Eigen::Dynamic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to have using statements up at file scope---keep them limited to tests. This is absolutely forbidden in non-test code and we're trying to clean up our tests, too. It's too easy to pollute the namespaces this way.
std::vector<double> build_quantiles(int N, double mu, double sigma, | ||
double unused) { | ||
std::vector<double> quantiles; | ||
int K = boost::math::round(2 * std::pow(N, 0.4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K should be defined as a double to begin with (round
returns double
here); then you don't need to cast anything in i / K
.
@@ -21,6 +21,9 @@ | |||
template <typename T_param> | |||
void assign_parameter_values(T_param& params, | |||
const std::vector<double>& values) { | |||
if(values.size() == 0) | |||
return; | |||
|
|||
// Static cast the size here because sometimes it is unsigned (if T_param | |||
// is a std::vector) and sometimes it is signed (when it is an Eigen type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a general way to do this with a metaprogram to pull the index out.
You should only ever cast as a very last resort. Again, this is just test code, so I'm not going to be too picky, but this isn't how we'd do this in actual Stan code.
@@ -36,6 +39,9 @@ void assign_parameter_values(T_param& params, | |||
*/ | |||
template <> | |||
void assign_parameter_values(double& param, const std::vector<double>& values) { | |||
if(values.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before parens in conditionals --- this isn't R!
check_dist_throws<RowVectorXd, VectorXd, RowVectorXd> | ||
(generate_samples, good_p1, bad_p1, good_p2, bad_p2, good_p3, bad_p3); | ||
check_dist_throws<RowVectorXd, RowVectorXd, RowVectorXd> | ||
(generate_samples, good_p1, bad_p1, good_p2, bad_p2, good_p3, bad_p3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever you see this much redundant code, you want to ask yourself if it's really necessary.
My first inclination here would be to wrap up the arguments (generate_samples, good_p1, bad_p1, good_p2, bad_p2, good_p3, bad_p3)
into a structure. Then it's just check_dist_throws<T1, T2, T3>(s)
where s
is the holding structure. That also simplifies the argument declaration for check_dist_throws
. I think you should make this change.
I think you also want to be checking int
---it can sometimes cause problems you don't see with double
with argument ambiguity resolution. I also think you should make this change.
My second inclination would be to use typelists to iterate over all the types. It's a template programming technique worth learning and a little recursion will deal with all the permutations in a way that's probably easier to validate is correct. The problem with these big lists is that it's very hard to check that they're complete. And they're hard to maintain when you add a new type. This change is optional, but I'd encourage you to at least check out the template technique and start looking over your shoulder every time you think about creating long lists like this.
…torized_rngs_unbounded_continuous
…hing) 1. Switched to typelists for generating all combinations of tests 2. Random number generators are now tested by building test rigs (that inherit from an abstract base class) and passing those to test functions 3. Wasn't able to easily add tests for ints This is all Issue 45
Hopefully the tests pass. I changed stuff around quite a bit to address the comments.
I changed how _rngs are tested. Now in each test you inherit from an abstract base class and build a special little object that has the right test hooks (see: https://github.com/bbbales2/math/blob/275889c5ba9185ea8cf6ec70b516013e4f63cb6f/test/unit/math/prim/mat/prob/normal_test.cpp). It's not actually a totally correct abstract base class cause the templated member function can't be virtual, but I thought this makes the test code look nice.
Hopefully done. There were a bunch of options. I went with a recursive templating solution. There was some variadic templating stuff but it was beyond me. It was tricky to get the right combination of functors/templating so that all the pieces fit together, but it's not so bad now.
I did not add integer tests (or "vector_rng_test_helper.hpp" is starting to get pretty verbose with all its little helpers and functions in it. Should I be breaking this out into pieces? If so, any suggestions on what needs broken out? I imagine this stuff is going to keep growing as the rest of the distribution things get added, so maybe it's worth waiting to see if any other big structural changes come along before we go through the process of breaking things out. |
On Oct 1, 2017, at 6:29 PM, Ben Bales ***@***.***> wrote:
Hopefully the tests pass. I changed stuff around quite a bit to address the comments.
My first inclination here would be to wrap up the arguments (generate_samples, good_p1, bad_p1, good_p2, bad_p2, good_p3, bad_p3) into a structure.
I changed how _rngs are tested. Now in each test you inherit from an abstract base class and build a special little object that has the right test hooks (see: https://github.com/bbbales2/math/blob/275889c5ba9185ea8cf6ec70b516013e4f63cb6f/test/unit/math/prim/mat/prob/normal_test.cpp). It's not actually a totally correct abstract base class cause the templated member function can't be virtual, but I thought this makes the test code look nice.
I just brought up this same issue w.r.t. the "base class" we want for whole models. Not
sure what the right answer is without being able to have virtual template methods.
My second inclination would be to use typelists to iterate over all the types.
Hopefully done. There were a bunch of options. I went with a recursive templating solution. There was some variadic templating stuff but it was beyond me. It was tricky to get the right combination of functors/templating so that all the pieces fit together, but it's not so bad now.
It's always a pain to do this.
I did not add integer tests (or std::vector tests). I guess it's important and I should do some thinking, but I might need to add extra lists of good/bad integer values. Rounding the good/bad double values won't always work (cause rounding a good value might give you an invalid value).
Right---it needs an extra list of good/bad integer values. And yes, this is important in that not having integers compile properly has bit us many times in the past. We'll have to make sure that the tests in Stan itself at least instantiate the integer cases.
"vector_rng_test_helper.hpp" is starting to get pretty verbose with all its little helpers and functions in it. Should I be breaking this out into pieces? If so, any suggestions on what needs broken out?
Always break into pieces if you can.
I imagine this stuff is going to keep growing as the rest of the distribution things get added, so maybe it's worth waiting to see if any other big structural changes come along before we go through the process of breaking things out.
Everything will keep growing, but that shouldn't freeze us in the meantime.
|
Looks like the tests are passing, if it finishes, this is ready to review. Questions:
It's really fragile. For instance the parameters 2, 1, -2.5 (in that order) cause some values of the quantiles to return -infs which screw up the tests. This is true for the non-vectorized non-integer tests as well. I can either try to improve the quantile computation, or I can just cherry pick some values that Boost can handle (currently what I've done).
If this looks like something that'd be useful for other tests, and it's obvious what necessary modifications it'd need, we can finish breaking it out (make it match the apply semantics + support other numbers of template arguments somehow). Otherwise, I'll just leave it as is (kinda special purpose for the vectorization stuff). |
For (1), Cherry pick results for now. File an issue with Boost if
you have examples that fail. (2) sounds good---we have a lot of
test harness stuff that could stand to be consolidated.
|
This is ready to be checked |
@bob-carpenter, I think @bbbales2 has made all the requested changes. Want to make sure and approve after you're done? |
Not everything got fixed. I don't have time to finish the next review now---I'm only a couple of my comments in but I should be able to get to it over the next few days. |
Cool beans, whenever is good. It'll be good to get this rolling again. |
I'll take another pass as soon as I can. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these changes are optional other than doc to indicate that it's really scalar/standard vector/Eigen vector types.
This was super helpful in understanding some of these distributions---I hadn't looked closely enough at the code before.
|
||
variate_generator<RNG&, uniform_01<> > rng_unit_01(rng, uniform_01<>()); | ||
for (size_t n = 0; n < N; n++) { | ||
double laplaceRN = rng_unit_01(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional fix]
What I was suggesting in the original comment was something like this
double z = rng_unit_01();
double sign = bernoulli(0.5) ? 1 : -1;
output[n] = mu_vec[n] - sign * sigma_vec[n] * log(z)
A slightly more efficient version would be
double z = uniform(-1, 1);
output[n] = mu_vec[n] - sign(z) * sigma_vec[n] * log(abs(z))
For sign()
, see: http://www.boost.org/doc/libs/1_62_0/libs/math/doc/html/math_toolkit/sign_functions.html
For uniform(-1, 1)
, see: http://www.boost.org/doc/libs/1_60_0/boost/random/uniform_real_distribution.hpp
For bernoulli(0.5)
, see: http://www.boost.org/doc/libs/1_60_0/boost/random/bernoulli_distribution.hpp
I got there through the version here with algebra, but it turns out to be pretty close to the Wikipedia version, only using uniform(-1, 1)
and the recognition that 1 - uniform(-1, 1)
has the same distribution as uniform(-1, 1)
and the same sign randomization.
double sigma, | ||
double lambda, | ||
/** | ||
* Return a pseudorandom Exponentially modified normal variate for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional fix]
I'd rather not capitalize anything other than proper names and names of distributions (not named after people) don't count.
* given location, scale, and inverse scale using the specified random | ||
* number generator. | ||
* | ||
* mu, sigma, and lambda can each be either scalars or vectors. All vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't they be scalars, standard vectors, or Eigen vectors?
* @return Exponentially modified normal random variate | ||
* @throw std::domain_error if mu is infinite, sigma is nonpositive, | ||
* or lambda is nonpositive | ||
* @throw std::invalid_argument if vector arguments are not the same length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this should say "container arguments" where you have "vector arguments".
+ exponential_rng(lambda, rng); | ||
} | ||
size_t N = max_size(mu, sigma, lambda); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional fix]
I'd remove all the vertical space that's not absolutely necessary for readability. And that includes braces around for loops iwth single statements. Vertical space is precious!
@@ -31,12 +42,25 @@ namespace stan { | |||
|
|||
check_finite(function, "Location parameter", mu); | |||
check_positive_finite(function, "Scale parameter", sigma); | |||
check_consistent_sizes(function, | |||
"Location parameter", mu, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
same comment here about packing arguments as much as possible
VectorBuilder<true, double, T_loc, T_scale, T_shape> output(N); | ||
|
||
variate_generator<RNG&, normal_distribution<> > | ||
norm_rng(rng, normal_distribution<>(0.0, 1.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
In general, we prefer 0
and 1
to 0.0
and 1.0
(yet another thing in Google style not caught by cpplint)
norm_rng(rng, normal_distribution<>(0.0, 1.0)); | ||
for (size_t n = 0; n < N; n++) { | ||
double r1 = norm_rng(), | ||
r2 = norm_rng(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
I'd prefer to have one declaration per line. And of course, tighten up vertical spaces!
* Return a pseudorandom student-t variate for the given degrees of freedom, | ||
* location, and scale using the specified random number generator. | ||
* | ||
* nu, mu, and sigma can each be either scalars or vectors. All vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same fix here as everywhere about scalar/standard vector/Eigen vector.
std::vector<double> good_p2, | ||
std::vector<int> good_p2_int, | ||
std::vector<double> bad_p2, | ||
std::vector<int> bad_p2_int) : N_(N), M_(M), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
Put the :
on a new line. You also don't want to waste horizontal space like this.
How about I change "vector" to "vector type" (like in https://github.com/stan-dev/stan/blob/develop/src/stan/lang/function_signatures.h#L10). This is how I did it in the normal_rng (must have just not copied it to the other _rngs yet): I'd like an easy way to refer to vector-like objects in a vague MATLAB/R this-is-vectorized kind of way.
Sure thing, this should all be pretty easy. I'll go through and see how much I can compress things. |
How about "standard vector, Eigen vector, and Eigen row vector"? That's explicit and not too long.
The reason to avoid reference to other classes is to preserve encapsulation---those other classes may change without this one changing.
… On Oct 25, 2017, at 3:08 PM, Ben Bales ***@***.***> wrote:
Can't they be scalars, standard vectors, or Eigen vectors?
So I think this should say "container arguments" where you have "vector arguments".
How about I change "vector" to "vector type" (like in https://github.com/stan-dev/stan/blob/develop/src/stan/lang/function_signatures.h#L10). This is how I did it in the normal_rng (must have just not copied it to the other _rngs yet):
mu and sigma can each be either a scalar or vector type (the ones that scalar_seq_view knows about). Any vector inputs must be the same length.
I'd like an easy way to refer to vector-like objects in a vague MATLAB/R this-is-vectorized kind of way.
same comment here about packing arguments as much as possible
Sure thing, this should all be pretty easy. I'll go through and see how much I can compress things.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
…torized_rngs_unbounded_continuous
…ing locally) (Issue 45)
I gotta ask for help on this one. I'm not sure what's going on, but it feels fishy. @seantalts or @syclik -- you guys might recognize something here -- it doesn't really have much to do with the actual pull. The first problem I had was the scalar tests for double_exponential_rng were failing when the vector ones weren't. I found two ways to fix it (and make the scalar double_exponential tests pass).
The effect I was seeing without this is that log(abs(z)) was going to -Inf, regardless of what z was. It felt like double was being cast to int, but I don't know if that should happen. My guess was that it was only seeing the log definition here: https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/fun/log.hpp ? Which is only for ints? And for some reason wasn't seeing the usual std::log definition? But this doesn't explain why the vector tests worked.
Both of these feel like wrong sorts of fixes. Any idea what could be going on? |
I took a 5 minute look and have only one possible suggestion - the old code had a |
@seantalts Oh wow, thanks! Problem solved. I didn't notice that. Yeah I was wondering how abs was working but |
…Removed reference to stan::math::sign (Issue 45)
We don't need to do it for this pull request, but we really need to start cleaning up our includes and namespaces. Developers keep trying to sneak in global using statements, which should be caught by cpplint but isn't. That's where leaks come from. When you have
The using statement just the function So wht happens when you have |
I should've added that if you're in |
Both of those should work. I'd messed up and removed the using std::log in one of my commits (this is what Sean pointed out). I switched to being explicit about std::log and std::abs. On a related note, stan::math::log in prim doesn't seem to support doubles. stan::math::log in rev does (so the autodiff works). Is this considered a bug? If prim doesn't support what rev does? |
This should be good to go. |
(Please ignore the 3rd failing check - still working out the kinks) |
…torized_rngs_unbounded_continuous
Fixed the merge conflicts with the string stuff, this is good to go again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great. The only change needed now is to put all the tests for illegal arguments to come before any work gets started allocating, etc. After that, it'll be fast to merge. All the math and container wrangling looks just fine.
scalar_seq_view<T_scale> sigma_vec(sigma); | ||
size_t N = max_size(mu, sigma); | ||
VectorBuilder<true, double, T_loc, T_scale> output(N); | ||
|
||
check_finite(function, "Location parameter", mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks should go first before any work. In general, try to put exceptional/odd/edge-case branches first so that when you're by them, you can scan the main branch of the program together. Would you mind fixing this on all of these? Sorry for not catchign that earlier.
Well you didn't see it before cause in the last commit when I was cleaning up whitespace n' such I just moved all the variable declarations to be in one blob :P. I've moved them below the argument checks now. Should be good to go if the tests pass! |
Thanks for moving up the argument validation tests. I approved (don't know why I'm saying that since it's obvious in the interface). |
Submission Checklist
./runTests.py test/unit
make cpplint
Summary:
This implements vectorized versions of all the unbounded continuous random number generators (with tests! Hooray!)
Details
I had to make some changes to test/unit/math/prim/mat/prob/vector_rng_test_helper.hpp to allow for anywhere from one to three parameters to be tested.
I also had to update the way the Skew normal random numbers were getting generated to avoid infs (this is part of Issue #527)
I also fixed some indents in test/unit/math/prim/scal/prob/util.hpp and added an extra Google test check there (that will throw if the code tries to do something that leads to segfaults in the tests).
@seantalts Hope you were looking forward to checking some repetitive code
How to Verify:
And the scalar tests (though I didn't change these)
Side Effects:
Hopefully none
Documentation:
It's all inline
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): University of California, Santa Barbara
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: