-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
turn back on allowing immediately assigned to decls not be NaN initialized #1029
turn back on allowing immediately assigned to decls not be NaN initialized #1029
Conversation
This might be a bit of a big ask, but I think we should try to check that this optimization is actually doing anything meaningful at the assembly code level after the C++ is compiled. I’m just spitballing, but my first instinct is that gcc probably does this kind of thing already and it would be good to look into it |
So the result is kind of interesting! When I read your comment I was like "Oh that's probably correct", but both compilers seem to miss that the NA mem is tossed and it doesn't need allocated actually. Below is the godbolt of comparing (at least a toy example) of how we currently do this vs. the new version. https://godbolt.org/z/3GW1ET461 On the LHS is our current and in the middle pane is this PR with a diff in the ASM on the RHS. Starting at |
Interesting! It definitely makes a difference for Eigen types. For scalars it seems like the asm is identical, and for std::vector it is a negligible difference, but it seems like the Eigen differences will make this definitely worthwhile. I wonder if this is something the Eigen folks themselves have looked at, because it seems like something the compiler definitely tries to do for other types. We should really revisit #549 soon. I'm not sure if the --O convention makes sense for us of if we should do it case by case, something like |
Oh also, do you know if this kind of analysis has been applied to any of the other optimizations? I know stuff like lazy code motion exists in gcc/clang, though obviously some of our partial eval wishes (like automatically creating log1m calls) is Stan-specific |
Oh neat! Can you share the godbolt link for that?
Yeah I'd really like that! Tbh I wouldn't hate #955 being at
tmk no that never happened. When I read the C++ for some of these I'm like hmmm, maybe? The lazy code motion one always looks a bit odd to me. But yeah it would be nice to have things like partial evaluation safely on as well. |
Scalars: https://godbolt.org/z/WefEajnox Vector: ASM is different but only does one allocation in both as far as I can tell |
I'm a strong believer in optimization only after profiling (or, in a weaker sense, at least looking at the assembly with a rough 'less assembly is probably faster than more assembly' metric). Especially with how clever modern compilers can be, it's not difficult to end up in a situation where the code you've written feels like it should be better, but the compiler isn't able to optimize it as much so it's actually worse. I think this and #955 are both good examples of things we know we can do better than gcc on, because of what we know about how we're using the matrices and the results of compiling our existing code |
I agree with the former but the latter I would be careful of "less assembly == better"! For instance, you can have one instance which has a generic asm loop doing multiplies, and another instance which branches off of the packet size to call a loop of x86, SSE, or AVX multiply instructions. The second one will be longer but much faster since it's using vectorized instructions. Another example is when the compiler will replace an expensive mul instruction with a few add and bitshift instructions. Mul is very expensive while a few adds and a bitshift is very cheap |
Yes exactly, you can’t just count lines. But, if you see fewer At any rate, I don’t think we’ve currently done much of either |
Oh also slightly modifying the https://godbolt.org/z/h3WGb6ovb The scalar version won't double allocate because it's a static size of 1 |
@serban-nicusor-toptal do you know where the little button went on jenkins/Blue Ocean that let you restart the tests? |
That is weird. Restarted them, Nic will know more on why you are not seeing it. |
We've recently made some changes to the ACL of Jenkins, I've fixed your permissions and you should see it now. |
Thank you! |
@WardBrian would you mind taking a look at this? |
I really don't know if I'm familiar enough with the optimization code to review - I would be relying pretty heavily on the test output rather than reviewing the actual changes |
imo I think that's fine if you don't mind. The optim here really doesn't use any of the optimization library stuff. We just traverse the MIR looking for a |
Might again be something more for #549, but lets say hypothetically this introduces a bug - can the user disable it easily? If it's enabled by default/for |
Oh your right, actually I do like the idea of first doing #549 and then doing this PR |
I think the primary thing that PR needs is doc? Unless we want to mirror what I did in #1058 and allow individual configuration on top of the O levels |
Umm, I think we just want the |
@WardBrian would you mind giving this a look? Would be a v nice optim to have at O1! |
Eigen::Matrix<double, -1, -1> X_tp2 = | ||
Eigen::Matrix<double, -1, -1>::Constant(10, 10, | ||
std::numeric_limits<double>::quiet_NaN()); | ||
stan::model::assign(lcm_sym9__, stan::math::exp(X_d), | ||
"assigning variable lcm_sym9__"); | ||
stan::model::assign(X_tp2, lcm_sym9__, "assigning variable X_tp2"); | ||
Eigen::Matrix<double, -1, -1> X_tp3; | ||
Eigen::Matrix<double, -1, -1> X_tp3 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These also seem like they're wrong re-introductions of the initializer. Do we have no way around this temporary here? The original model doesn't feature control flow here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so at first as well but if you look at the next lines
stan::model::assign(lcm_sym10__, stan::math::exp(lcm_sym9__),
"assigning variable lcm_sym10__");
stan::model::assign(X_tp3, lcm_sym10__, "assigning variable X_tp3");
The lazy code motion assigns lcm_sym10__
right below this and then assigns X_tp3
. That fails the check so NA's get filled here. We could run this before lazy code motion but I'm not sure if things can go from needing to be initalized to not initialized and vice versa during the different propogations and lcm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So previously it was running earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No before it had different logic. The previous scheme was, once we hit a decl, to iterate though the remaining statements and as long as we see a full assignment before it was used then we would allow it to be not initialized. But the logic is simpler now that we just look at the statement right below the decl. If it is not an assignment then we force it to fill with NA values. It's lazier but it's much safer. imo I like this simpler approach because it hits the bigger misses right now. For instance it stops the params from being filled with NA values (since they are always filled through the deserializer)
Summary
This turns back on allowing the
allow_uninitialized_decls
as a default optimization. The previous version was trying to be too clever with figuring out if a decl initialization was done inside of if/while/block statements.This also adds the integration stan program
unenforce-initialize.stan
to the compiler optimizations test set which shows the generated C++ from some odd examples of inits we need to catch and set to NA such as using a variable on the lhs and rhs during an decl and assignment likereal x = x;
Submission Checklist
Release notes
Turn back on allowing immediately assigned to decls not be NaN initialized
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)