-
Notifications
You must be signed in to change notification settings - Fork 36
/
roaches.Rmd
375 lines (288 loc) · 14.7 KB
/
roaches.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
---
title: "Roaches cross-validation demo"
author: "[Aki Vehtari](https://users.aalto.fi/~ave/)"
date: "First version 2017-01-10. Last modified `r format(Sys.Date())`."
output:
html_document:
fig_caption: yes
toc: TRUE
toc_depth: 2
number_sections: TRUE
toc_float:
smooth_scroll: FALSE
bibliography: modelsel.bib
csl: harvard-cite-them-right.csl
link-citations: yes
---
# Setup {.unnumbered}
```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=FALSE, message=FALSE, error=FALSE, warning=FALSE, comment=NA, out.width='95%')
```
**Load packages**
```{r}
library(rstanarm)
library(brms)
options(mc.cores = parallel::detectCores())
library(loo)
library(ggplot2)
library(bayesplot)
theme_set(bayesplot::theme_default(base_family = "sans"))
```
# Introduction
This notebook demonstrates cross-validation of simple misspecified
model. In this case, cross-validation is useful to detect
misspecification.
The example comes from Chapter 8.3 of [Gelman and Hill (2007)](http://www.stat.columbia.edu/~gelman/arm/) and the introduction text for the data is from [Estimating Generalized Linear Models for Count Data with rstanarm](https://cran.r-project.org/web/packages/rstanarm/vignettes/count.html) by Jonah Gabry and Ben Goodrich.
We want to make inferences about the efficacy of a certain pest management system at reducing the number of roaches in urban apartments. Here is how Gelman and Hill describe the experiment (pg. 161):
> the treatment and control were applied to 160 and 104 apartments, respectively, and the outcome measurement $y_i$ in each apartment $i$ was the number of roaches caught in a set of traps. Different apartments had traps for different numbers of days
In addition to an intercept, the regression predictors for the model are the pre-treatment number of roaches `roach1`, the treatment indicator `treatment`, and a variable indicating whether the apartment is in a building restricted to elderly residents `senior`. Because the number of days for which the roach traps were used is not the same for all apartments in the sample, we include it as an `exposure2` by adding $\ln(u_i)$) to the linear predictor $\eta_i$ and it can be specified using the `offset` argument to `stan_glm`.
# Poisson model
Load data
```{r}
data(roaches)
# Roach1 is very skewed and we take a square root
roaches$sqrt_roach1 <- sqrt(roaches$roach1)
```
Fit with stan_glm
```{r, results='hide'}
stan_glmp <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
data = roaches, family = poisson,
prior = normal(0,2.5), prior_intercept = normal(0,5),
chains = 4, cores = 1, seed = 170400963, refresh=0)
```
## Analyse posterior
Plot posterior
```{r}
mcmc_areas(as.matrix(stan_glmp), prob_outer = .999)
```
We have all marginals significantly away from zero.
## Cross-validation checking
We can use Pareto-smoothed importance sampling leave-one-out cross-validation as model checking tool [@Vehtari+etal:PSIS-LOO:2017].
```{r}
(loop <- loo(stan_glmp))
```
We got serious warnings, let's plot Pareto $k$ values.
```{r}
plot(loop)
```
There are several observations which are highly influential, which
indicates potential model misspecification [@Vehtari+etal:PSIS-LOO:2017].
Before looking in more detail where the problem is or fixing it, let's check what would cross-validation say about relevance of covariates.
We form 3 models by dropping each of the covariates out.
```{r, results='hide'}
stan_glmm1p <- update(stan_glmp, formula = y ~ treatment + senior)
stan_glmm2p <- update(stan_glmp, formula = y ~ sqrt_roach1 + senior)
stan_glmm3p <- update(stan_glmp, formula = y ~ sqrt_roach1 + treatment)
```
Although Pareto $k$ values were very large we can make a quick test with PSIS-LOO (if the comparison would say there is difference, then PSIS-LOO couldn't be trusted and PSIS-LOO+ or k-fold-CV would be needed [see more in @Vehtari+etal:PSIS-LOO:2017]).
```{r}
loo_compare(loo(stan_glmm1p), loop)
loo_compare(loo(stan_glmm2p), loop)
loo_compare(loo(stan_glmm3p), loop)
```
Based on this the roaches covariate would be relevant, but although dropping treatment or senior covariate will make a large change to elpd, the uncertainty is also large and cross-validation states that these covariates are not necessarily relevant! The posterior marginals are conditional on the model, but cross-validation is more cautious by not using any model for the future data distribution.
## Posterior predictive checking
It's also good to remember that in addition of cross-validation, the posterior predictive checks can often detect problems and also provide more information about the reason. Here we test the proportion of zeros predicted by the model and compare them to the observed number of zeros.
```{r}
prop_zero <- function(y) mean(y == 0)
(prop_zero_test1 <- pp_check(stan_glmp, plotfun = "stat", stat = "prop_zero"))
```
# Negative binomial model
Next we change the Poisson model to a more robust negative binomial model
```{r, results='hide'}
stan_glmnb <- update(stan_glmp, family = neg_binomial_2)
```
## Analyse posterior
Plot posterior
```{r}
mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
pars = c("(Intercept)","sqrt_roach1","treatment","senior"))
```
Treatment effect is much closer to zero, and senior effect has lot of probability mass on both sides of 0. So it matters, which model we use.
We discuss posterior dependencies in more detail in `collinear` notebook, but for reference we plot also here paired marginals.
```{r}
mcmc_pairs(as.matrix(stan_glmnb),pars = c("(Intercept)","sqrt_roach1","treatment","senior"))
```
There are some posterior correlations, but not something which would change our conclusions.
## Cross-validation checking
Let's check PSIS-LOO Pareto $k$ diagnostics
```{r}
(loonb <- loo(stan_glmnb))
```
All khat's are ok, which indicates that negative-Binomial would be
better (for final results it would be good to run PSIS-LOO+). We can also compare Poisson and negative-Binomial.
```{r}
loo_compare(loop, loonb)
```
Negative-Binomial model is clearly better than Poisson.
As Poisson is a special case of negative-Binomial, we could have also seen that Poisson is likely by looking at the posterior of the over-dispersion parameter (which gets very small values).
```{r}
mcmc_areas(as.matrix(stan_glmnb), prob_outer = .999,
pars = c("reciprocal_dispersion"))
```
## Posterior predictive checking
We next use posterior predictive checking to check that the improved model can also predict the proportion of zeros well.
```{r}
(prop_zero_test2 <- pp_check(stan_glmnb, plotfun = "stat", stat = "prop_zero"))
```
The result looks much better than for the Poisson model.
## Predictive relevance of covariates
Let's finally check cross-validation model comparison that it agrees on relevance of covariates
```{r, results='hide'}
stan_glmm1nb <- update(stan_glmm1p, family = neg_binomial_2)
stan_glmm2nb <- update(stan_glmm2p, family = neg_binomial_2)
stan_glmm3nb <- update(stan_glmm3p, family = neg_binomial_2)
```
```{r}
loo_compare(loo(stan_glmm1nb), loonb)
loo_compare(loo(stan_glmm2nb), loonb)
loo_compare(loo(stan_glmm3nb), loonb)
```
Roaches1 has clear effect. Treatment effect is visible in marginal posterior, but as discussed in `betablockers` demo, cross-validation is not good for detecting weak effects. Based on cross-validation senior effect is also not relevant.
Conclusion from the analysis would be then that, treatment is likely to help, but it's difficult to predict the number of roaches given treatment or not.
# Poisson model with "random effects"
Sometimes overdispersion is modelled by adding "random effects" for each individual. The following example illustrates computational problems in this approach.
```{r}
roaches$id <- 1:dim(roaches)[1]
```
Fit with stan_glm
```{r, results='hide'}
stan_glmpr <- stan_glmer(y ~ sqrt_roach1 + treatment + senior + (1 | id),
offset = log(exposure2),
data = roaches, family = poisson,
prior = normal(0,2.5), prior_intercept = normal(0,5),
chains = 4, cores = 4, iter = 4000,
seed = 170400963, refresh=0)
```
## Analyse posterior
Plot posterior
```{r}
mcmc_areas(as.matrix(stan_glmpr), prob_outer = .999,
pars = c("(Intercept)","sqrt_roach1","treatment","senior"))
```
The marginals are similar as with negative-binomial model except the
marginal for senior effect is clearly away from zero.
## Cross-validation checking
Let's check PSIS-LOO.
```{r}
(loopr <- loo(stan_glmpr))
```
We got serious warnings, which in this case is due to having a "random effect" parameter for each observations. Removing one observation changes the posterior for that random effect so much that importance sampling fails (even with Pareto smoothing). Note that WAIC would fail due to the same reason.
We can also plot Pareto $k$ values.
```{r}
plot(loopr)
```
We have a very large number of high $k$ values which indicates very flexible model.
While importance sampling in PSIS-LOO can fail for "random effect"
model, we can use $K$-fold-CV instead to re-fit the model 10 times,
each time leaving out 10% of the observations. This shows that
cross-validation itself is not infeasible for "random effect" models.
```{r}
(kcvpr <- kfold(stan_glmpr, K=10))
```
loo package allows comparing PSIS-LOO and $K$-fold-CV results
```{r}
loo_compare(loonb, kcvpr)
```
There is not much difference, and this difference could also be
explained by $K$-fold-CV using only 90% of observations for the
posteriors, while PSIS-LOO is using 99.6% of observations for the
posteriors. We can check this by running $K$-fold-CV also for negative-binomial model.
```{r}
(kcvnb <- kfold(stan_glmnb, K=10))
loo_compare(kcvnb, kcvpr)
```
When both models are assessed using $K$-fold-CV, the difference in predictive performance is very small. The models can still have different predictive distributions.
Now that we've seen that based on robust $K$-fold-CV there is not much difference between negative-binomial and random-effect-Poisson models, we can also check how bad the comparison would have been with PSIS-LOO.
```{r}
loo_compare(loonb, loopr)
```
If we would have ignored Pareto-$k$ warnings, we would have mistakenly assumed that random effect model is much better. Note that WAIC is (as usual) even worse
```{r}
loo_compare(waic(stan_glmnb), waic(stan_glmpr))
```
## Posterior predictive checking
We do posterior predictive checking also for Poisson with "random effects" model.
```{r}
(prop_zero_test3 <- pp_check(stan_glmpr, plotfun = "stat", stat = "prop_zero"))
```
The proportion of zeros in posterior predictive simulations from
Poisson with "random effects" model is different (less variation) than
from negative-binomial model. The models are similar in that way that
are modeling over-dispersion, but the model for the over-dispersion is
different.
# Zero-inflated negative-binomial model
As the proportion of zeros is quite high in the data, it is worthwhile to test also a zero-inflated negative-binomial model, which is a mixture of two models
- logistic regression to model the proportion of extra zero counts
- negative-binomial model
We switch to brms as rstanarm doesn't support zero-inflated
negative-binomial model
```{r results='hide'}
brm_glmznb <- brm(bf(y ~ sqrt_roach1 + treatment + senior + offset(log(exposure2)),
zi ~ sqrt_roach1 + treatment + senior + offset(log(exposure2))),
family=zero_inflated_negbinomial(), data=roaches,
prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
```
```{r}
looznb <- loo(brm_glmznb)
loo_compare(loonb, looznb)
```
Based on loo, zero-inflated negative-binomial is clearly better.
Posterior predictive checking provides further evidence that
zero-inflated negative-binomial is better.
```{r }
yrepnzb <- posterior_predict(brm_glmznb)
(prop_zero_test4 <- ppc_stat(y=roaches$y, yrepnzb, stat=function(y) mean(y==0)))
```
Proportion of zeros is similar, but has more variation compared to negative-binomial model. However there is much bigger difference in max count test statistic. We first check the max count PPC for negative-binomial model.
```{r }
(max_test_nb <- pp_check(stan_glmnb, plotfun = "stat", stat = "max"))
```
which shows that negative-binomial model is predicting often 10-100 larger roach counts than in the data (40,000 roaches in one trap is a lot).
The max count PPC for zero-inflated negative-binomial model
```{r }
(max_test_znb <- ppc_stat(y=roaches$y, yrepnzb, stat="max"))
```
is much better, although still the max counts can be 10 times bigger
than the max count in the data.
## Analyse posterior
Plot posterior
```{r}
mcmc_areas(as.matrix(brm_glmznb)[,1:8], prob_outer = .999)
```
The marginals for negative-binomial part are similar to marginals in the plain negative-binomial model. The marginal effects for the logistic part have opposite sign as the logistic part is modelling the extra zeros.
## Cross-validation checking
## Predictive relevance of covariates
Let's finally check cross-validation model comparison to see whether improved model has effect on the predictive performance comparison.
```{r, results='hide'}
brm_glmm1znb <- brm(bf(y ~ treatment + senior + offset(log(exposure2)),
zi ~ treatment + senior + offset(log(exposure2))),
family=zero_inflated_negbinomial(), data=roaches,
prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
brm_glmm2znb <- brm(bf(y ~ sqrt_roach1 + senior + offset(log(exposure2)),
zi ~ sqrt_roach1 + senior + offset(log(exposure2))),
family=zero_inflated_negbinomial(), data=roaches,
prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
brm_glmm3znb <- brm(bf(y ~ sqrt_roach1 + treatment + offset(log(exposure2)),
zi ~ sqrt_roach1 + treatment + offset(log(exposure2))),
family=zero_inflated_negbinomial(), data=roaches,
prior=set_prior("normal(0,1)"), seed=170400963, refresh=500)
```
```{r}
loo_compare(loo(brm_glmm1znb),looznb)
loo_compare(loo(brm_glmm2znb),looznb)
loo_compare(loo(brm_glmm3znb),looznb)
```
Roaches1 has clear effect. Treatment effect is visible in marginal posterior, but as discussed in `betablockers` demo, cross-validation is not that good for detecting weak effects.
<br />
# References {.unnumbered}
<div id="refs"></div>
# Licenses {.unnumbered}
* Code © 2017-2019, Aki Vehtari, licensed under BSD-3.
* Text © 2017-2019, Aki Vehtari, licensed under CC-BY-NC 4.0.
* Parts of text and code © 2017, Jonah Gabry and Ben Goodrich from [rstanarm vignette for count data](https://cran.r-project.org/web/packages/rstanarm/vignettes/count.html), licensed under GPL 3>
# Original Computing Environment {.unnumbered}
```{r}
sessionInfo()
```
<br />