-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero_sum_vector type #1111
Comments
Let's get this to work with multiplier and offset too. As an aside, if you leave off the intercept and declare the full size of the parameter, the interpretation of the parameters is relative to the mean value of the parameter vector. |
I’ve let #971 fall out of date a little since it didn’t seem like there was a consensus on merging it, but composing offset/multiplier with other transforms is definitely still a possibility |
I think multipliers would make sense here, but I don't see how offset would help with the sum-to-zero constraint. |
I'm looking at https://mc-stan.org/docs/2_28/stan-users-guide/parameterizing-centered-vectors.html middle of the page:
|
I ran some more experiments to compare the Here's Stan code to evaluate the parameters {
vector[1] alpha1;
vector[3] beta1;
vector[7] gamma1;
vector[15] delta1;
}
transformed parameters {
vector[2] alpha0 = append_row(0, alpha1);
vector[4] beta0 = append_row(0, beta1);
vector[8] gamma0 = append_row(0, gamma1);
vector[16] delta0 = append_row(0, delta1);
}
model {
alpha0 ~ normal(0, 10);
beta0 ~ normal(0, 10);
gamma0 ~ normal(0, 10);
delta0 ~ normal(0, 10);
}
generated quantities {
vector[2] alpha = softmax(alpha0);
vector[4] beta = softmax(beta0);
vector[8] gamma = softmax(gamma0);
vector[16] delta = softmax(delta0);
} Note that the prior is over the transformed parameters (e.g.,
Here's a model to test the sum-to-zero approach. Note again that the prior is on the transformed sum-to-zero parameter here. If you only put a prior on the raw parameters parameters {
vector[1] alpha1;
vector[3] beta1;
vector[7] gamma1;
vector[15] delta1;
}
transformed parameters {
vector[2] alpha0 = append_row(alpha1, -sum(alpha1));
vector[4] beta0 = append_row(beta1, -sum(beta1));
vector[8] gamma0 = append_row(gamma1, -sum(gamma1));
vector[16] delta0 = append_row(delta1, -sum(delta1));
}
model {
alpha0 ~ normal(0, 10);
beta0 ~ normal(0, 10);
gamma0 ~ normal(0, 10);
delta0 ~ normal(0, 10);
}
generated quantities {
vector[2] alpha = softmax(alpha0);
vector[4] beta = softmax(beta0);
vector[8] gamma = softmax(gamma0);
vector[16] delta = softmax(delta0);
}
and here's the posterior, which is now symmetric (up to sampling error---if you run a lot longer, the
With more draws, the
In the sum-to-zero case, if you look at the constrained parameters, you get this, with symmetric values that have slightly lower standard deviation than the
This is because we have effectively added multiple constraints. If you look at the binary case, it's easy to see that the prior is equivalent to alpha0[1] ~ normal(0, 10);
alpha0[2] ~ normal(0, 10); and then because alpha1[1] ~ normal(0, 10);
alpha1[1] ~ normal(0, 10); so we get a double prior on |
@spinkney: I think we need to remove that translation bit as I don't think it makes sense here. I hadn't realized someone had worked out the reduction in scale as |
I tested this using the svd approach and it appears to work as well U <- list()
d <- list()
j <- 0
for(i in c(2, 4, 8, 16)) {
j <- j + 1
Sigma <- contr.sum(i) %*% diag(i - 1) %*% t(contr.sum(i))
s <- svd(Sigma) # Guarantee symmetry
U[[j]] <- s$u
d[[j]] <- abs(zapsmall(s$d))
}
mod_test <- cmdstan_model("sum_to_zero_test.stan")
stan_data <- list(U_alpha = U[[1]],
d_alpha = d[[1]],
U_beta = U[[2]],
d_beta = d[[2]],
U_gamma = U[[3]],
d_gamma = d[[3]],
U_delta = U[[4]],
d_delta = d[[4]]
)
test_out <- mod_test$sample(
data = stan_data,
parallel_chains = 4
)
summary_out <- test_out$summary(c("alpha", "beta", "gamma", "delta"))
data {
matrix[2, 2] U_alpha;
vector[2] d_alpha;
matrix[4, 4] U_beta;
vector[4] d_beta;
matrix[8, 8] U_gamma;
vector[8] d_gamma;
matrix[16, 16] U_delta;
vector[16] d_delta;
}
parameters {
vector[1] alpha1;
vector[3] beta1;
vector[7] gamma1;
vector[15] delta1;
}
transformed parameters {
vector[2] alpha0 = diag_post_multiply(U_alpha, d_alpha) * append_row(alpha1, 0);
vector[4] beta0 = diag_post_multiply(U_beta, d_beta) * append_row(beta1, 0);
vector[8] gamma0 = diag_post_multiply(U_gamma, d_gamma) * append_row(gamma1, 0);
vector[16] delta0 = diag_post_multiply(U_delta, d_delta) * append_row(delta1, 0);
}
model {
alpha0 ~ normal(0, inv(sqrt(1 - inv(2))));
beta0 ~ normal(0, inv(sqrt(1 - inv(4))));
gamma0 ~ normal(0, inv(sqrt(1 - inv(8))));
delta0 ~ normal(0, inv(sqrt(1 - inv(16))));
}
generated quantities {
vector[2] alpha = softmax(alpha0);
vector[4] beta = softmax(beta0);
vector[8] gamma = softmax(gamma0);
vector[16] delta = softmax(delta0);
} where a normal(0, 1) prior is implied on each
|
I think the general problem with the SVD-based and QR-based approaches is that they imply doing calculations with data/transformed data. This would be painful to do with our current architecture where the constraining/unconstraining transforms are functions that don't take in auxiliary information. @WardBrian said it wouldn't be impossible, but would be a lot of work and complicate the parser. |
Is your feature request related to a problem? Please describe.
Suppose we have a logistic regression with say, 5 age groups, 6 income levels, and 2 sexes. If I want to write a regression like this:
then I run into identifiability issues. There are (at least) two ways to deal with this.,
(a) setting
beta[1] = 0
,gamma[1] = 0
, anddelta[1] = 0
, or(b) setting
sum(beta) = 0
,sum(gamma) = 0
, andsum(delta) = 0
.Approach (a) is more traditional in frequentist applications, but we like (b) more for Bayes because the priors are symmetric rather than defined relative to the first element.
Describe alternatives you've considered
This can be coded in Stan currently as follows.
Note that
delta = [delta[1], -delta[1]]'
, which is how well like to handle binary predictors like this.Describe the solution you'd like
I'd like to be able to declare the above as
and have Stan do the transform under the hood. There is no Jacobian because the transform is a constant linear function, so the C++ for the transform and inverse are trivial.
Considerations for priors
Either way we do this (pinning the first value to 0 or setting last value to negative sum of other values), the meaning of the transformed parameters is a bit subtle. The Stan program
is equivalent to unfolding the model as
which because
delta[2] = -delta[1]
and a normal with location parameter 0 works out towhich works out to be equivalent to using
If we instead use
delta ~ normal(0, sqrt(2) * sigma_delta())
, then we get the equivalent ofdelta[1] ~ normal(0, sigma_delta)
, which meanssigma_delta
is the scale of half the difference betweendelta[1]
anddelta[2]
. The same effect can be derived more efficiently by just writing this out directly asIf you want to put a prior on the scale of the difference, that can also be done directly without the need for a Jacobian adjustment as
or equivalently,
The problem with the traditional approach setting
alpha[1] = 0
is that we can only put priors onalpha[2:K]
, the scale of which is determined relative to having setalpha[1] = 0
. That means the priors are implicitly about differences betweenalpha[1]
andalpha[k]
.The text was updated successfully, but these errors were encountered: