FR: Matrix inputs to multi_normal_* #937

mike-lawrence · 2021-07-27T13:20:18Z

It would be really nice to have signatures for the multi_normal_* family that accept matrix inputs for at least the data and means arguments, possibly with a new argument that designates whether they should be treated as row-wise or column-wise manner. For example, see below for a centered-parameterized hierarchical model (of the same core structure as SUG1.13), where I currently have to do two format conversions, one to convert a matrix to an array of row-vectors for input to multi_normal_cholesky, then another conversion of the first argument to a matrix for use with rows_dot_product. The latter is made easier by #931, but it would be nice to skip it altogether.

data{

	// nXg: number of cols in the group-level predictor matrix
	int<lower=2> nXg ;

	// rXg: number of rows in the group-level predictor matrix
	int<lower=nXg> rXg ;

	// Xg: group-level predictor matrix
	matrix[rXg,nXg] Xg ;

	// nI: number of individuals
	int<lower=nXg> nI ;

	// iXg: which group each individual is associated with
	array[nI] int<lower=1,upper=rXg> iXg ;

	// nXq: number of cols in the condition-level predictor matrix
	int<lower=2> nXq ;

	// rXq: number of rows in the condition-level predictor matrix
	int<lower=nXq> rXq ;

	// Xq: condition-level predictor matrix
	matrix[rXq,nXq] Xq ;

	// iXq: which individual is associated with each row in Xq
	array[rXq] int<lower=1,upper=nI> iXq ;

	// nY: num entries in the observation vectors
	int<lower=nXg*nXq*nI> nY ;

	// Y: observations
	vector[nY] Y ;

	// yXq: which row in Xq is associated with each observation in Y
	array[nY] int<lower=1,upper=rXq> yXq ;

}
parameters{

	// Y_sd: magnitude of observation-level variability
	real<lower=0> Y_sd ;

	// Z: coefficients associated with predictors (group-level, condition-level, & interactions)
	matrix[nXg,nXq] Z ;

	// iZq_sd: magnitude of variability among individuals within a group
	vector<lower=0>[nXq] iZq_sd ;

	// iZq_cholcorr: cholesky-factor of correlation structure associated with variability among individuals on influence of within-individual predictors
	cholesky_factor_corr[nXq] iZq_cholcorr ;

	// iZq: by-individual coefficients (centered parameterization)
	array[nI] row_vector[nXq] iZq ;

}
model{

	////
	// group-level structure
	////

	// standard-normal priors on all group-level coefficients
	to_vector(Z) ~ std_normal() ;

	// using the group predictors and coefficients, compute condition coefficients for each group
	// NOT SURE THIS IS THE MOST EFFICIENT WAY TO DO THIS (ESP. GIVEN NECESSARY LATER CONVERSION)
	matrix[rXg,nXq] gZq ;
	for(this_nXq in 1:nXq){
		gZq[,this_nXq]= rows_dot_product(
			rep_matrix(to_row_vector(Z[,this_nXq]),rXg)
			, Xg
		) ;
	}

	//convert gZq from matrix to array of row-vectors
	array[rXg] row_vector[nXq] gZq_arr ;
	for(this_rXg in 1:rXg){
		gZq_arr[this_rXg] = gZq[this_rXg] ;
	}


	////
	// individual-level structure
	////

	// positive-standard-normal priors on magnitude of variability among individuals within a group
	iZq_sd ~ std_normal() ;

	// flat prior on correlations
	iZq_cholcorr ~ lkj_corr_cholesky(1) ;

	// multi-normal structure for iZq
	iZq ~ multi_normal_cholesky(
		gZq_arr[iXg]
		, diag_pre_multiply(iZq_sd, iZq_cholcorr)
	) ;

	//convert iZq from array of row-vectors to matrix
	// (hopefully replaceable by `to_matrix(iZ)` soon,
	// see: https://github.com/stan-dev/cmdstan/issues/1015 )
	matrix[nI,nXq] iZq_mat ;
	for(this_nI in 1:nI){
		iZq_mat[this_nI] = iZq[this_nI] ;
	}

	// using the indivividual condition coefficients and predictors, compute
	// values for each individual
	vector[nXq] iZq_dot_Xq = rows_dot_product( iZq_mat[iXq] , Xq ) ;

	////
	// observation-level structure
	////

	// prior peaked around .8 for magnitude of observation-level variability
	Y_sd ~ weibull(2,1) ;

	Y ~ normal(
		iZq_dot_Xq[yXq]
		, Y_sd
	) ;

}

SteveBronder · 2021-07-28T20:47:39Z

Yes I agree! I have an issue in Stan math about this as well and @adamhaber is currently working on this

stan-dev/math#2532

bob-carpenter · 2022-05-03T21:45:10Z

@mike-lawrence: I totally feel your pain here. Part of this (array of vector vs. matrix) is due to an original bad design decision I made to give those different memory representations. Given that, I think the easiest thing to do now is write transform functions then the use of them won't get in the way of reading the program as much as new variable decls and loops.

Here's a version I refactored to use functions (and also remove the comments and reformat so I could more easily read). I didn't test, so use with caution!

functions {
  matrix to_mat(array[] row_vector x) {
    matrix[x.size(), rows(x[1])] y;
    for (i in 1:x.size())
      y[i, ] = x[i];
    return y;
  }
  array[] row_vector to_arr_rowvec(matrix x) {
    array[x.rows()] row_vector[x.cols()] y;
    for (i in 1:y.size())
      y[i] = x.row(i);
    return y;
  }
}
data {
  int<lower=2> nXg;
  int<lower=nXg> rXg;
  matrix[rXg, nXg] Xg;
  int<lower=nXg> nI;
  array[nI] int<lower=1, upper=rXg> iXg;
  int<lower=2> nXq;
  int<lower=nXq> rXq;
  matrix[rXq,nXq] Xq;
  array[rXq] int<lower=1, upper=nI> iXq;
  int<lower=nXg * nXq * nI> nY;
  vector[nY] Y;
  array[nY] int<lower=1, upper=rXq> yXq;
}
parameters {
  real<lower=0> Y_sd;
  matrix[nXg, nXq] Z;
  vector<lower=0>[nXq] iZq_sd;
  cholesky_factor_corr[nXq] iZq_cholcorr;
  array[nI] row_vector[nXq] iZq;
}
model {
  to_vector(Z) ~ std_normal();
  matrix[rXg,nXq] gZq;
  for (this_nXq in 1:nXq)
    gZq[ , this_nXq]
      = rows_dot_product(rep_matrix(to_row_vector(Z[ , this_nXq]),
                                    rXg),
                         Xg);
  iZq_sd ~ std_normal();
  iZq_cholcorr ~ lkj_corr_cholesky(1);
  iZq ~ multi_normal_cholesky(to_arr_rowvec(gZq)[iXg],
                              diag_pre_multiply(iZq_sd, iZq_cholcorr));
  Y_sd ~ weibull(2,1);
  Y ~ normal(rows_dot_product(to_mat(iZq)[iXq], Xq)[yXq], Y_sd);
}

The most efficient way to do this is going to depend a lot on what all those indexing arrays of integers look like (iXq, yXq, this_nXq, iXg). If there's a lot of duplicate entries you want to do one thing (exploit sufficient stats and turn all that sampling into a multiplication), whereas if there are sparse entries, then you don't need to convert the whole big matrix.

A quick improvement would be to make Z an array of row vectors to avoid copying within the inner loop. Also, using rep_matrix in order to use rows_dot_product is probably less efficient than just looping and using dot product.

I'm tempted to define gZq as an array of row vectors to start with, but then the assign would need a to_array wrapper but then the conversion function later wouldn't be needed. Either way, it's super awkward.

mike-lawrence mentioned this issue May 3, 2022

Allow multivariate distributions to take in matrices stan-dev/math#2532

Open

WardBrian added the feature New feature or request label Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: Matrix inputs to multi_normal_* #937

FR: Matrix inputs to multi_normal_* #937

mike-lawrence commented Jul 27, 2021 •

edited

Loading

SteveBronder commented Jul 28, 2021

bob-carpenter commented May 3, 2022

FR: Matrix inputs to multi_normal_* #937

FR: Matrix inputs to multi_normal_* #937

Comments

mike-lawrence commented Jul 27, 2021 • edited Loading

SteveBronder commented Jul 28, 2021

bob-carpenter commented May 3, 2022

mike-lawrence commented Jul 27, 2021 •

edited

Loading