Andrew Gelman, Jennifer Hill, Aki Vehtari 2021-04-20
- 18 Causal inference and randomized experiments
- 21 Additional topics in causal inference
Tidyverse version by Bill Behrman.
Causal analysis of Sesame Street experiment. See Chapters 18 and 21 in Regression and Other Stories.
# Packages
library(tidyverse)
library(brms)
library(rstanarm)
# Parameters
# Seed
SEED <- 1234
# Results of educational experiment
file_sesame <- here::here("Sesame/data/sesame.csv")
# Common code
file_common <- here::here("_common.R")
#===============================================================================
# Run common code
source(file_common)
Data
sesame <-
file_sesame %>%
read_csv() %>%
mutate(
site =
case_when(
`_Isite_2` == 1 ~ 2,
`_Isite_3` == 1 ~ 3,
`_Isite_4` == 1 ~ 4,
`_Isite_5` == 1 ~ 5,
TRUE ~ 1
) %>%
factor()
) %>%
relocate(site, .before = `_Isite_2`)
glimpse(sesame)
#> Rows: 240
#> Columns: 32
#> $ rownames <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
#> $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
#> $ sex <dbl> 1, 2, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1,…
#> $ age <dbl> 66, 67, 56, 49, 69, 54, 47, 51, 69, 53, 58, 58, 49, 64, 58,…
#> $ viewcat <dbl> 1, 3, 3, 1, 4, 3, 3, 2, 4, 3, 2, 4, 1, 2, 2, 3, 2, 4, 3, 3,…
#> $ setting <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
#> $ viewenc <dbl> 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2,…
#> $ prebody <dbl> 16, 30, 22, 23, 32, 29, 23, 32, 27, 30, 25, 21, 28, 26, 23,…
#> $ prelet <dbl> 23, 26, 14, 11, 47, 26, 12, 48, 44, 38, 48, 25, 8, 11, 15, …
#> $ preform <dbl> 12, 9, 9, 10, 15, 10, 11, 19, 18, 17, 14, 13, 9, 15, 9, 17,…
#> $ prenumb <dbl> 40, 39, 9, 14, 51, 33, 13, 52, 42, 31, 38, 29, 13, 21, 16, …
#> $ prerelat <dbl> 14, 16, 9, 9, 17, 14, 11, 15, 15, 10, 16, 16, 8, 10, 9, 12,…
#> $ preclasf <dbl> 20, 22, 8, 13, 22, 14, 12, 23, 20, 17, 18, 21, 12, 15, 11, …
#> $ postbody <dbl> 18, 30, 21, 21, 32, 27, 22, 31, 32, 32, 26, 17, 20, 26, 28,…
#> $ postlet <dbl> 30, 37, 46, 14, 63, 36, 45, 47, 50, 52, 52, 29, 16, 28, 21,…
#> $ postform <dbl> 14, 17, 15, 13, 18, 14, 12, 18, 17, 19, 15, 15, 9, 15, 10, …
#> $ postnumb <dbl> 44, 39, 40, 19, 54, 39, 44, 51, 48, 52, 42, 40, 18, 35, 22,…
#> $ postrelat <dbl> 14, 14, 9, 8, 14, 16, 12, 17, 14, 17, 10, 10, 10, 16, 10, 1…
#> $ postclasf <dbl> 23, 22, 19, 15, 21, 24, 15, 23, 24, 24, 17, 19, 13, 14, 17,…
#> $ peabody <dbl> 62, 8, 32, 27, 71, 32, 28, 38, 49, 32, 43, 58, 39, 43, 56, …
#> $ agecat <dbl> 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1,…
#> $ encour <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,…
#> $ site <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ `_Isite_2` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ `_Isite_3` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ `_Isite_4` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ `_Isite_5` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ regular <dbl> 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,…
#> $ watched <dbl> 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,…
#> $ encouraged <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,…
#> $ y <dbl> 30, 37, 46, 14, 63, 36, 45, 47, 50, 52, 52, 29, 16, 28, 21,…
#> $ pretest <dbl> 23, 26, 14, 11, 47, 26, 12, 48, 44, 38, 48, 25, 8, 11, 15, …
Let’s check the derived variable site
.
sesame %>%
count(site, across(starts_with("_Isite")))
#> # A tibble: 5 x 6
#> site `_Isite_2` `_Isite_3` `_Isite_4` `_Isite_5` n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 0 0 0 0 60
#> 2 2 1 0 0 0 55
#> 3 3 0 1 0 0 64
#> 4 4 0 0 1 0 43
#> 5 5 0 0 0 1 18
Let’s first look at the effect of encouraging children to watch Sesame Street.
v <-
sesame %>%
count(encouraged, watched) %>%
group_by(encouraged) %>%
mutate(watched_prop = n / sum(n)) %>%
ungroup()
v
#> # A tibble: 4 x 4
#> encouraged watched n watched_prop
#> <dbl> <dbl> <int> <dbl>
#> 1 0 0 40 0.455
#> 2 0 1 48 0.545
#> 3 1 0 14 0.0921
#> 4 1 1 138 0.908
Without any encouragement, 55% of children still watched Sesame Street. With encouragement, that percentage increased to 91%
We now calculate an estimate of the effect of watching Sesame Street for the children who received encouragement and who in turn actually watched the show, the compliers.
We first estimate the percentage of children actually induced to watch
Sesame Street by the intervention, which is the coefficient on the
instrument (encouraged
), in the following regression:
set.seed(264)
fit_1a <- stan_glm(watched ~ encouraged, data = sesame, refresh = 0)
print(fit_1a, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: watched ~ encouraged
#> observations: 240
#> predictors: 2
#> ------
#> Median MAD_SD
#> (Intercept) 0.55 0.04
#> encouraged 0.36 0.05
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 0.38 0.02
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
The estimated coefficient of encouraged
here is 0.36. This is simply
the difference in the proportion of children who watched Sesame Street
when encouraged vs. when not encouraged:
v %>%
filter(watched == 1) %>%
summarize(
watched_diff = watched_prop[encouraged == 1] - watched_prop[encouraged == 0]
)
#> # A tibble: 1 x 1
#> watched_diff
#> <dbl>
#> 1 0.362
We then compute the intent-to-treat estimate, obtained in this case
using the regression of the outcome (the post-treatment measurement of
the letter recognition task, postlet
) on the instrument:
set.seed(264)
fit_1b <- stan_glm(postlet ~ encouraged, data = sesame, refresh = 0)
print(fit_1b, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: postlet ~ encouraged
#> observations: 240
#> predictors: 2
#> ------
#> Median MAD_SD
#> (Intercept) 24.92 1.42
#> encouraged 2.90 1.82
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 13.36 0.60
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
The estimated coefficient of encouraged
in this regression is 2.9,
which we then “inflate” by dividing by the percentage of children
affected by the intervention:
wald_est <- coef(fit_1b)[["encouraged"]] / coef(fit_1a)[["encouraged"]]
wald_est
#> [1] 7.95
This ratio is sometimes called the Wald estimate.
The first step is to regress the “treatment” variable – an indicator for
regular watching (watched
) – on the randomized instrument,
encouragement to watch (encouraged
). This is the same as fit_1a
above, but we run again here and relabel to indicate that in this
framework it is our first-stage model.
set.seed(264)
fit_2a <- stan_glm(watched ~ encouraged, data = sesame, refresh = 0)
print(fit_2a, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: watched ~ encouraged
#> observations: 240
#> predictors: 2
#> ------
#> Median MAD_SD
#> (Intercept) 0.55 0.04
#> encouraged 0.36 0.05
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 0.38 0.02
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
Add the predicted values for watched
.
sesame <-
sesame %>%
mutate(watched_pred_2a = predict(fit_2a))
summary(sesame$watched_pred_2a)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.546 0.546 0.910 0.777 0.910 0.910
The second step is to regress the letter recognition outcome on the
predicted values for watched
.
set.seed(264)
fit_2b <- stan_glm(postlet ~ watched_pred_2a, data = sesame, refresh = 0)
print(fit_2b, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: postlet ~ watched_pred_2a
#> observations: 240
#> predictors: 2
#> ------
#> Median MAD_SD
#> (Intercept) 20.58 3.96
#> watched_pred_2a 7.95 5.01
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 13.36 0.60
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
Here the coefficient of watched_pred_2a
is the effect of watching
Sesame Street on letter recognition for those who would watch if
encouraged but not otherwise (compliers). This second-stage regression
does not give the correct standard error, however, as we discuss below.
The randomization for this particular experiment took place within sites and settings; it is therefore appropriate to adjust for these covariates in estimating the treatment effect. Additionally, pre-treatment scores are available that are highly predictive of post-test scores.
The first step of two-stage approach adjusting for these covariates:
set.seed(264)
fit_3a <-
stan_glm(
watched ~ encouraged + prelet + site + setting,
data = sesame,
refresh = 0
)
print(fit_3a, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: watched ~ encouraged + prelet + site + setting
#> observations: 240
#> predictors: 8
#> ------
#> Median MAD_SD
#> (Intercept) 0.66 0.11
#> encouraged 0.34 0.05
#> prelet 0.01 0.00
#> site2 0.03 0.07
#> site3 -0.12 0.06
#> site4 -0.34 0.07
#> site5 -0.30 0.10
#> setting -0.05 0.05
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 0.35 0.02
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
Add the predicted values for watched
.
sesame <-
sesame %>%
mutate(watched_pred_3a = predict(fit_3a))
summary(sesame$watched_pred_3a)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.216 0.638 0.806 0.775 0.973 1.147
The second step is to regress the letter recognition outcome on the
predicted values for watched
.
set.seed(264)
fit_3b <-
stan_glm(
postlet ~ watched_pred_3a + prelet + site + setting,
data = sesame,
refresh = 0
)
print(fit_3b, digits = 2)
#> stan_glm
#> family: gaussian [identity]
#> formula: postlet ~ watched_pred_3a + prelet + site + setting
#> observations: 240
#> predictors: 8
#> ------
#> Median MAD_SD
#> (Intercept) 1.21 4.90
#> watched_pred_3a 14.08 4.12
#> prelet 0.70 0.08
#> site2 8.41 1.91
#> site3 -3.86 1.82
#> site4 0.98 2.55
#> site5 2.88 3.01
#> setting 1.65 1.50
#>
#> Auxiliary parameter(s):
#> Median MAD_SD
#> sigma 9.70 0.46
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
The estimated effect of watching Sesame Street on the compliers is about 14.1 points on the letter recognition test. Again, we do not trust this standard error and will discuss next how to appropriately adjust it for the two stages of estimation.
We show here how to adjust the standard error to account for the uncertainty in both stages of the model.
resid <- residuals(fit_3b)
x_adj <- model.matrix(fit_3b)
x_adj[, "watched_pred_3a"] <- sesame$watched
resid_adj <- sesame$postlet - x_adj %*% coef(fit_3b)
se_adj <-
se(fit_3b)[["watched_pred_3a"]] * sqrt(sum(resid^2) / sum(resid_adj^2))
se_adj
#> [1] 4.29
In this example, the resulting standard error of 4.29 is slightly larger than the unadjusted standard error of 4.12.
It is also possible to perform a Bayesian version of the two-stage least squares model directly using brms. Here is how to estimate the effect of regularly watching Sesame Street on post-treatment letter recognition using encouragement as an instrument:
set.seed(264)
fit_brm_1 <-
brm(
formula = bf(watched ~ encouraged) + bf(postlet ~ watched),
data = sesame,
refresh = 0,
seed = SEED
)
fit_brm_1
#> Family: MV(gaussian, gaussian)
#> Links: mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> Formula: watched ~ encouraged
#> postlet ~ watched
#> Data: sesame (Number of observations: 240)
#> Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup samples = 4000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> watched_Intercept 0.55 0.04 0.47 0.63 1.00 4710 3005
#> postlet_Intercept 20.54 3.88 13.27 28.58 1.00 2100 2038
#> watched_encouraged 0.36 0.05 0.25 0.46 1.00 4476 2649
#> postlet_watched 7.97 4.88 -2.01 17.08 1.00 2074 1850
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_watched 0.38 0.02 0.35 0.42 1.00 4569 2616
#> sigma_postlet 12.67 0.72 11.45 14.22 1.00 2713 2204
#>
#> Residual Correlations:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> rescor(watched,postlet) 0.17 0.15 -0.13 0.45 1.00 1960
#> Tail_ESS
#> rescor(watched,postlet) 1782
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
The resulting estimate is the coefficient postlet_watched
, which is
the instrumental variables estimate of watched
on postlet
; the
estimate is 8.0 with a standard error of 4.9.
To incorporate other pre-treatment variables as controls, we must include them in both stages of the regression model; for example,
set.seed(264)
fit_brm_2 <-
brm(
formula =
bf(watched ~ encouraged + prelet + setting + site) +
bf(postlet ~ watched + prelet + setting + site),
data = sesame,
refresh = 0,
seed = SEED
)
fit_brm_2
#> Family: MV(gaussian, gaussian)
#> Links: mu = identity; sigma = identity
#> mu = identity; sigma = identity
#> Formula: watched ~ encouraged + prelet + setting + site
#> postlet ~ watched + prelet + setting + site
#> Data: sesame (Number of observations: 240)
#> Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup samples = 4000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> watched_Intercept 0.67 0.11 0.46 0.88 1.00 3413 2919
#> postlet_Intercept 1.27 4.61 -8.34 9.83 1.00 1862 2427
#> watched_encouraged 0.34 0.05 0.24 0.43 1.00 4354 2875
#> watched_prelet 0.01 0.00 -0.00 0.01 1.00 6814 3278
#> watched_setting -0.06 0.05 -0.16 0.04 1.00 4252 2898
#> watched_site2 0.03 0.07 -0.10 0.16 1.00 3267 2949
#> watched_site3 -0.11 0.07 -0.24 0.02 1.00 3157 2642
#> watched_site4 -0.34 0.07 -0.49 -0.20 1.00 3658 2951
#> watched_site5 -0.29 0.10 -0.49 -0.10 1.00 3346 2950
#> postlet_watched 13.95 3.87 6.38 21.99 1.00 1854 2275
#> postlet_prelet 0.70 0.08 0.55 0.85 1.00 5049 3039
#> postlet_setting 1.59 1.41 -1.11 4.44 1.00 3220 2947
#> postlet_site2 8.43 1.81 4.85 11.95 1.00 3998 3027
#> postlet_site3 -3.93 1.79 -7.44 -0.38 1.00 3208 2610
#> postlet_site4 0.90 2.40 -3.75 5.60 1.00 2395 2710
#> postlet_site5 2.75 2.83 -2.77 8.23 1.00 2591 2890
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_watched 0.36 0.02 0.33 0.39 1.00 6369 3046
#> sigma_postlet 9.44 0.53 8.51 10.58 1.00 3043 2262
#>
#> Residual Correlations:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
#> rescor(watched,postlet) -0.18 0.15 -0.47 0.12 1.00 1864
#> Tail_ESS
#> rescor(watched,postlet) 2174
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
The resulting estimate is the coefficient postlet_watched
, which is
the instrumental variables estimate of watched
on postlet
; the
estimate is 14.0 with a standard error of 3.9.