Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize matrix function signatures #1470

Closed
5 of 6 tasks
t4c1 opened this issue Nov 29, 2019 · 43 comments
Closed
5 of 6 tasks

Generalize matrix function signatures #1470

t4c1 opened this issue Nov 29, 2019 · 43 comments

Comments

@t4c1
Copy link
Contributor

t4c1 commented Nov 29, 2019

Description

Matrix functions can be generalized to work with Eigen expressions not just matrices. By propagating matrix expressions around we can avoid creating temporary matrices, improving performance. This could also make adding support for sparse matrices easier.

There was some discoussion about this on discourse. Also an abandoned PR #1309. Contrary to this PR I intend to split the this into smaller chunks:

  • element-wise functions (ones that use apply_scalar_unary). Prim implementation of these can in many cases use Eigen implementations that can be faster than std.
  • operator-like functions (add, subtract, multiply, divide, elt_multiply, elt_divide)
  • size and view functions (col, row, sub_col, sub_row, head, tail, diagonal, transpose, cols, rows, num_elements, dims)
  • reminder of */fun
  • */prob
  • allow functions to return expressions

Example

Instead of :

Eigen::MatrixXd f(const Eigen::MatrixXd& x)

We can do:

template<typename T>
auto f(const T& x)

Expected Output

More general and possibly faster code.

Current Version:

v3.0.0

@bob-carpenter
Copy link
Contributor

+1 for this. We also need to do this to allow support for sparse matrices. @SteveBronder has been working on that along with perfect forwarding so we can reduce copying and exploit move semantics wherever possible.

@andrjohns
Copy link
Collaborator

We'd need to start coding a bit more defensively after implementing this, otherwise there's the risk of accidentally evaluating the expression multiple times. Take log_sum_exp for example:

template <typename T>
auto log_sum_exp(T& x) {
  if (x.size() == 0) {
    return -std::numeric_limits<double>::infinity();
  }

  const double max = x.maxCoeff();
  if (!std::isfinite(max)) {
    return max;
  }
  return max + std::log((x.array() - max).exp().sum());
}

If log_sum_exp is passed an expression, something like log_sum_exp(m1.array().exp().matrix() * m2), then that expression will get evaluated three times: x.size(), x.maxCoeff(), and the final return statement. So we'd need to start evaluating the expression at the beginning of the function, and probably also updating the documentation so that other/new developers know why

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 30, 2019

Right. I plan to do that in each function as I generalize its signature. At that time I can also make sure an expression is not used twice. Although I don't think just taking size of an expression evaluates it.

I also agree with the need to document this. Any idea where the doc should be put?

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Dec 1, 2019 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Dec 1, 2019 via email

@SteveBronder
Copy link
Collaborator

I have a little info on adding new docs to the stan site here

https://github.com/stan-dev/math/blob/develop/doxygen/howto_add_page.md

I wonder if for some functions we use so much info about the matrices it would be better to leave their template arguments as Matrix<T, R, C> just to force evaluation.

@t4c1
Copy link
Contributor Author

t4c1 commented Dec 2, 2019

If a function needs to evaluate an argument and the template params for that arguments are known (not templated) we can just leave that argument as is - const Matrix<T, R, C>&. However that does not work if we need template deduction at the same time. In that case I intend to use what Bob suggested.

@andrjohns
Copy link
Collaborator

If a function needs to evaluate an argument and the template params for that arguments are known (not templated) we can just leave that argument as is - const Matrix<T, R, C>&

You could also use the Ref<> functionality to maintain generality of input types, which (AFAIK) will take expressions as inputs and evaluate them into temporaries as needed. Using an example from Eigen's doc:

// read-only const argument:
void foo2(const Ref<const VectorXf>& x);

MatrixXf A;

foo2(A.row());              // Compilation error because A.row() is a 1xN object while foo2 is expecting a Nx1 object
foo2(A.row().transpose());  // The row is copied into a contiguous temporary
foo2(2*a);                  // The expression is evaluated into a temporary
foo2(A.col().segment(2,4)); // No temporary

Although some of the copies might be too expensive to be worth it

@bob-carpenter
Copy link
Contributor

First, I thought Ref didn't work with slicing, segmenting, etc., but it seems to work OK. Second, I thought it was type not size that determined passing a VectorXd, but the following compiles and runs.

What are the rules for executing into temporaries? I wouldn't have thought any of those template expressions would cause an intermediate.

void foo2(const Eigen::Ref<const Eigen::VectorXd>& x) {
  std::cout << x << std::endl;
}

TEST(foo, bar) {
  Eigen::MatrixXd A(2, 2);
  A << 1, 2, 3, 4;
  foo2(A.row(0).transpose());

  Eigen::VectorXd a(2);
  a << 10, 100;
  foo2(2 * a);
  foo2(A.col(0).segment(0, 1));

  Eigen::MatrixXd B(2, 1);
  foo2(B);
}

@t4c1
Copy link
Contributor Author

t4c1 commented Dec 4, 2019

This page says:

By default, a Ref can reference any dense vector expression of float having a contiguous memory layout. Likewise, a Ref can reference any column-major dense matrix expression of float whose column's elements are contiguously stored with the possibility to have a constant space in-between each column, i.e. the inner stride must be equal to 1, but the outer stride (or leading dimension) can be greater than the number of rows.

In the const case, if the input expression does not match the above requirement, then it is evaluated into a temporary before being passed to the function.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Dec 5, 2019 via email

@t4c1
Copy link
Contributor Author

t4c1 commented Feb 19, 2020

I am running out of ideas how to group remaining functions into PRs. So I will just go in alphabetical order.

@SteveBronder
Copy link
Collaborator

I am running out of ideas how to group remaining functions into PRs. So I will just go in alphabetical order.

Can we make a list of the remaining ones as an issue? I think you put one up somewhere a while ago though I couldn't find it. Would be good so we could divvy up work if anyone else is interested. I started picking apart some of the stuff for read_corr/cov_L etc.

@t4c1
Copy link
Contributor Author

t4c1 commented Feb 19, 2020

No, I never made such a list. If you have a list of all functions in Stan Math I am happy to check ones that I already completed.

@rok-cesnovar
Copy link
Member

I think Steve is reffering to the list of functions we made for the kernel generator.

@SteveBronder
Copy link
Collaborator

Ah yeah nvm wrong one. Yeah I can look through and slam an issue up. We can use it as the issue for the remaining ones

@t4c1
Copy link
Contributor Author

t4c1 commented Feb 20, 2020

I don't think we need another issue for this as we have this one. Feel free to either post here or edit top level comment once you make the list.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 10, 2020

We are very close to having all functions accept expressions, so I am thinking about the details of returning expressions. Whenever a function returns an expression referencing a local variable it needs to use holder (see #1775).

When a function returns an expression referencing a function argument it gets a bit less clear if a holder is necessary. In majority of use cases (including all possible uses from stan language) everything should work just fine without holder. However in c++ a function can be used in a way that the returned expression is not evaluated before its rvalue argument(s) get deleted. For example this happens if the resulting expression is stored in an auto variable. Example (assuming holder is not used in softmax):

Eigen::MatrixXd a = ...
Eigen::MatrixXd b = ...
auto c = softmax(a+b); //after this statement sum expression is destructed and softmax expression is left with a dangling reference
Eigen::MatrixXd d = c+1; //now the softmax expression c is evaluated

So the question is, should we use holder for such cases (to make code safer) or not (avoiding overhead of creating holder and simplifying code)? @SteveBronder @bbbales2 @andrjohns

@bbbales2
Copy link
Member

My first reaction is holders cause it's less to worry about then.

How does this code compare to something written with Eigen? Like:

#include <Eigen/Core>
#include <iostream>

int main() {
  Eigen::MatrixXd a = Eigen::MatrixXd::Random(2, 2);
  Eigen::MatrixXd b = Eigen::MatrixXd::Random(2, 2);
  auto c = (a + b).array().abs().sqrt();

  std::cout << c + 1.0 << std::endl;

  return 0;
}

Is Eigen having to allocate holder things along the way?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 10, 2020

This example would work with eigen. However if you replaced addition with a function that returns a plain matrix (not expression) it would have the same issue.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 10, 2020

Also eigen does not use anything like holder. In functions in Eigen, where such dangers would happen they usually evaluate the result of the function instead of returning an expression.

@bbbales2
Copy link
Member

Oh interesting. Adding an eval causes a memory problem here (had to turn on the sanitizer to get it). Weird: https://godbolt.org/z/M6sdGc

And what is the overhead of allocating a holder? Is it a malloc and a free, basically? Or is it something more?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 10, 2020

Oh, I overlooked eval in your example. Yeah that is why it fails too.

About overhead: more or less just malloc and free.

@bbbales2
Copy link
Member

Nah nah, I added the eval after you said the plain type would cause it to fail. Tricky!

@bbbales2
Copy link
Member

should we use holder for such cases (to make code safer)

So using a holder in this case means our softmax would allocate a holder for its argument? Or where would the holder appear in the code?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 10, 2020

Yes softmax would allocate it (if the argument is rvalue). You can look at implementation of value_of to see how holder is used.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 11, 2020

I've been thinking about this some more. If we use holder for arguments, we need to do perfect forwarding on these arguments, as make_holder() needs to now whether it is dealing with lvalues or rvalues. But for these functions to actually know whether they have lvalue or rvalue wee need to forward arguments to them. So we need to use perfect forwarding in all functions that can call any function using holder. In other words in almost all functions.

On the other hand avoiding auto is not that hard, so I am leaning towards not using holder for arguments.

@SteveBronder
Copy link
Collaborator

I've been thinking about this some more. If we use holder for arguments, we need to do perfect forwarding on these arguments, as make_holder() needs to now whether it is dealing with lvalues or rvalues.

My cargo culting was not all wrong! 😆

If we really wanted to go this route I think I could come up with a regex that searches over function signatures and replaces const (TemplateName)& with (TemplateName)&&. Though then will have to start dealing with C++'s memory semantics which adds a non-trivial layer of complexity. The overhead of that could probably be mitigated a bit by good documentation. Though then we would use make_holder() adding one more level of complexity. We'd need like a Kurzgesagt style youtube infographic video to explain all of that

Or alt, could we have a make_holder that takes in a type like make_holder<T_A>(A) and for plain types does nothing but for expressions does the alloc/free? My thought here is that when we see something like an Addition<Mat, Mat> come into an function it's almost always going to be an rvalue. Though then there's Qs about Block<> types and dealing with things like Block<Addition<Mat1, Mat2>>.

It feels like kind of a bummer to ban auto. What happens with something like softmax(softmax(a + b))? Would a+b's expr still get erased here?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 11, 2020

I am still trying to avoid littering all the code with calls to std::forward and make_holder, which even ignoring the work to get there would make code less readable.

That second paragraph is more or less how Eigen expressions work. That works in most cases but fails as soon as it gets a rvalue plain type.

It feels like kind of a bummer to ban auto

It would not get completely banned. Just banned unless you know what you are doing. :D

What happens with something like softmax(softmax(a + b))? Would a+b's expr still get erased here?

No. Temporaries are destructed after everything in a line of code gets executed. If expression referencing temporaries is evaluated in the same line as these temporaries are created everything works fine.

@bbbales2
Copy link
Member

It would not get completely banned.

Could we have our eval function handle this?

auto x = eval(...);

Like if you're using stan::math and you're assigning to an auto variable, eval. That's how I understand Eigen lol.

I guess that means the difference between us and Eigen then is that we can't have expressions as l-values and Eigen can?

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 11, 2020

Could we have our eval function handle this?

It can be used like this if you want.

I guess that means the difference between us and Eigen then is that we can't have expressions as l-values and Eigen can?

Nope. We are using Eigen expression anyway.

@bbbales2
Copy link
Member

Well Eigen can do:

auto c = (a + b).array().sqrt(); // somehow eigen's sum survives
auto d = c + 1.0;

And we cannot do:

auto c = softmax(a + b); // our sum gets destructed
auto d = c + 1.0;

@SteveBronder
Copy link
Collaborator

@t4c1 can we start removing the code like

    return ret_type(ret);

In functions that use reverse_pass_callback()?

https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/elt_multiply.hpp#L42

If it's safe to pass around arena matrices now I can also put up a lil PR in stanc3 that tests making stuff in the parameters block into arena matrices

@SteveBronder
Copy link
Collaborator

SteveBronder commented Nov 11, 2020

Actually wrt stanc3 here's a lil brutal version that uses arena_t as the matrix type when the scalar type of the matrix is a var

stan-dev/stanc3@master...SteveBronder:feature/arena-var-matrix

We could take off the ret_type() yada from the stuff the uses reverse pass callback rn and get an eyeball on how much we save there. Though imo the bigger thing is going to be doing some form of arena_t<> with the data

@bbbales2
Copy link
Member

can we start removing the code like

return ret_type(ret);

I'm still a bit nervous about this unless something has happened and we're talking about var<mat> and not mat<var>.

Check this code out:

Eigen::Matrix<stan::math::var, -1, -1> myfunc1() {
  Eigen::MatrixXd res_val(1, 1);
  res_val << 1.0;

  stan::arena_t<Eigen::Matrix<stan::math::var, -1, -1>> res = res_val;
  stan::math::reverse_pass_callback([res]() mutable {
      std::cout << "myfunc1" << std::endl;
      std::cout << res.val() << std::endl;
    });

  return res;
}

auto myfunc2() {
  Eigen::MatrixXd res_val(1, 1);
  res_val << 1.0;

  stan::arena_t<Eigen::Matrix<stan::math::var, -1, -1>> res = res_val;
  stan::math::reverse_pass_callback([res]() mutable {
      std::cout << "myfunc2" << std::endl;
      std::cout << res.val() << std::endl;
    });

  return res;
}

TEST(AgradRevMatrix, test_arena_output) {
  auto a = myfunc1();
  a(0) = 2.0;
  auto b = myfunc2();
  b(0) = 3.0;

  stan::math::grad();
}

In both cases the function sets the result value to 1, but this is the output:

myfunc2
3
myfunc1
1

The arena_t<Matrix<var, -1, -1>> is playing the same role as the vari ** s in the old varis, and if we don't make a copy of that on the output we're in danger of overwriting things.

@SteveBronder
Copy link
Collaborator

Oof yeah now that I think about this, any assign() statement could cause this. The only thing I can think of to have it would work be to do the same tricky stuff I'm working on right now for var<mat> assign where we save the previous slice and re-apply it as a callback in the reverse pass

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 12, 2020

Well Eigen can do: And we cannot do:

The difference is that softmax needs to create a local matrix, which is than referenced by returned expression and rsqrt only needs to reference the given expression.

@t4c1 can we start removing the code like

Yeah, Ben is right. We can do it for var. For mat we need the copy. But mat will eventually be gone anyway.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 12, 2020

Also I am assuming you are ok with not using holder for arguments.

@bbbales2
Copy link
Member

The difference is that softmax needs to create a local matrix

Yeah and is that not true for all our functions?

I think what we'd write to document this is:

If a Stan expression is going to be used as an lvalue, it shouldn't depend on any rvalue temporaries. Practically this means that an expression saved as an lvalue should only use one function call, because the intermediates allocated in a sequence of function calls will get destructed.

Now that I write that it doesn't sound that clear, but that's what I'm thinking.

I feel like Eigen must have something special going on that we don't cause its expressions merge into bigger and bigger expressions that don't necessarily go away. The reason I'm making the direct comparison to Eigen is it would be cool if we could say "Stan expressions have the same limitations as Eigen expressions", which I don't think is the case cause our functions don't build giant expression types in the same way.

But mat will eventually be gone anyway.

var<mat> design doc needs an update. Assigning to the sub-matrix needs talked about and this seems like a big change of plans too.

@bbbales2
Copy link
Member

Also I am assuming you are ok with not using holder for arguments.

Yeah I'm cool with this. It's the same with any oddity. It's fine for the language since we can hide it, and it needs doc'ed for Math so someone doing C++ doesn't get confused and mad.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 12, 2020

Yeah and is that not true for all our functions?

Not for all ,but for many it is true. What I am trying to say is it is not that we are doing things things in different way in our functions - we are doing different things.

it would be cool if we could say "Stan expressions have the same limitations as Eigen expressions"

Actually we are better - we have less limitations! In cases such as softmax that need internal matrices, we will use holder, so that will not be an issue anymore. Eigen's way of doing it would be to simply give up on expressions and eval the result.

I guess softmax was not the best example to start this discussion with, as it will need holder anyway.

@bbbales2
Copy link
Member

In cases such as softmax that need internal matrices, we will use holder, so that will not be an issue anymore
I guess softmax was not the best example to start this discussion with, as it will need holder anyway.

Oh okay so we are going to add some holders, just not holders everywhere. Yeah @ me when you put up this pull so I can stare at it a bit.

@t4c1
Copy link
Contributor Author

t4c1 commented Nov 12, 2020

It will be up in 5 min :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants