-
Notifications
You must be signed in to change notification settings - Fork 23
/
README.Rmd
executable file
·255 lines (184 loc) · 7.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
---
output: github_document
bibliography: inst/references.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dpi = 300
)
```
# did2s
<!-- badges: start -->
<!-- badges: end -->
The goal of did2s is to estimate TWFE models without running into the problem of staggered treatment adoption.
For common issues, see this issue: [https://github.com/kylebutts/did2s/issues/12](https://github.com/kylebutts/did2s/issues/12)
## Installation
You can install did2s from CRAN with:
``` r
install.packages("did2s")
```
To install the development version, run the following:
```{r, eval = FALSE}
devtools::install_github("kylebutts/did2s")
```
## Two-stage Difference-in-differences [@Gardner_2021]
For details on the methodology, view this [vignette](https://kylebutts.github.io/did2s/articles/Two-Stage-Difference-in-Differences.html)
To view the documentation, type `?did2s` into the console.
The main function is `did2s` which estimates the two-stage did procedure. This function requires the following options:
- `yname`: the outcome variable
- `first_stage`: formula for first stage, can include fixed effects and covariates, but do not include treatment variable(s)!
- `second_stage`: This should be the treatment variable or in the case of event studies, treatment variables.
- `treatment`: This has to be the 0/1 treatment variable that marks when treatment turns on for a unit. If you suspect anticipation, see note above for accounting for this.
- `cluster_var`: Which variables to cluster on
Optional options:
- `weights`: Optional variable to run a weighted first- and second-stage regressions
- `bootstrap`: Should standard errors be bootstrapped instead? Default is False.
- `n_bootstraps`: How many clustered bootstraps to perform for standard errors. Default is 250.
did2s returns a list with two objects:
1. fixest estimate for the second stage with corrected standard errors.
### TWFE vs. Two-Stage DID Example
I will load example data from the package and plot the average outcome among the groups.
```{r load-data}
# Automatically loads fixest
library(did2s)
# Load Data from R package
data("df_het", package = "did2s")
df_het = as.data.frame(df_het)
```
Here is a plot of the average outcome variable for each of the groups:
```{r plot-df-het, fig.width=8, fig.height=4, dpi=300, fig.cap="Example data with heterogeneous treatment effects"}
# Mean for treatment group-year
agg <- aggregate(df_het$dep_var, by = list(g = df_het$g, year = df_het$year), FUN = mean)
agg$g <- as.character(agg$g)
agg$g <- ifelse(agg$g == "0", "Never Treated", agg$g)
never <- agg[agg$g == "Never Treated", ]
g1 <- agg[agg$g == "2000", ]
g2 <- agg[agg$g == "2010", ]
plot(0, 0,
xlim = c(1990, 2020), ylim = c(3.5, 7.2), type = "n",
main = "Data-generating Process", ylab = "Outcome", xlab = "Year"
)
abline(v = c(1999.5, 2009.5), lty = 2)
lines(never$year, never$x, col = "#8e549f", type = "b", pch = 15)
lines(g1$year, g1$x, col = "#497eb3", type = "b", pch = 17)
lines(g2$year, g2$x, col = "#d2382c", type = "b", pch = 16)
legend(
x = 1990, y = 7.1, col = c("#8e549f", "#497eb3", "#d2382c"),
pch = c(15, 17, 16),
legend = c("Never Treated", "2000", "2010")
)
```
### Estimate Two-stage Difference-in-Differences
First, lets estimate a static did. There are two things to note here. First, note that I can use `fixest::feols` formula including the `|` for specifying fixed effects and `fixest::i` for improved factor variable support. Second, note that `did2s` returns a `fixest` estimate object, so `fixest::etable`, `fixest::coefplot`, and `fixest::iplot` all work as expected.
```{r static}
# Static
static <- did2s(
df_het,
yname = "dep_var", first_stage = ~ 0 | state + year,
second_stage = ~ i(treat, ref = FALSE), treatment = "treat",
cluster_var = "state"
)
fixest::etable(static)
```
This is very close to the true treatment effect of ~2.23.
Then, let's estimate an event study did. Note that relative year has a value of `Inf` for never treated, so I put this as a reference in the second stage formula.
```{r event-study}
# Event Study
es <- did2s(df_het,
yname = "dep_var", first_stage = ~ 0 | state + year,
second_stage = ~ i(rel_year, ref = Inf), treatment = "treat",
cluster_var = "state"
)
```
And plot the results:
```{r plot-es, fig.cap="Event-study plot with example data", fig.width=8, fig.height=5, dpi=300}
fixest::iplot(es, main = "Event study: Staggered treatment", xlab = "Relative time to treatment", col = "steelblue", ref.line = -0.5, drop = "Inf")
# Add the (mean) true effects
true_effects <- head(tapply((df_het$te + df_het$te_dynamic), df_het$rel_year, mean), -1)
points(-20:20, true_effects, pch = 20, col = "black")
# Legend
legend(
x = -20, y = 3, col = c("steelblue", "black"), pch = c(20, 20),
legend = c("Two-stage estimate", "True effect")
)
```
### Comparison to TWFE
```{r plot-compare, fig.cap="TWFE and Two-Stage estimates of Event-Study", fig.width=8, fig.height=5, dpi=300}
twfe <- feols(dep_var ~ i(rel_year, ref = c(Inf, -1)) | unit + year, data = df_het)
fixest::iplot(
list(es, twfe),
sep = 0.2, ref.line = -0.5,
col = c("steelblue", "#82b446"), pt.pch = c(20, 18),
xlab = "Relative time to treatment",
main = "Event study: Staggered treatment (comparison)",
drop = "Inf"
)
# Legend
legend(
x = -20, y = 3, col = c("steelblue", "#82b446"), pch = c(20, 18),
legend = c("Two-stage estimate", "TWFE")
)
```
### Honest DID
In version 1.1.0, we added support for computing a sensitivity analysis using the approach of Rambachan and Roth (2021).
Here's an example using data from [here](https://github.com/Mixtape-Sessions/Advanced-DID/tree/main/Exercises/Exercise-1). The provided dataset `ehec_data.dta` contains a state-level panel dataset on health insurance coverage and Medicaid expansion. The variable `dins` shows the share of low-income childless adults with health insurance in the state. The variable `yexp2` gives the year that a state expanded Medicaid coverage under the Affordable Care Act, and is missing if the state never expanded.
```{r ehec-data-est, fig.cap="Estimates of the effect of Medicaid expansion on health insurance coverage", fig.width=8, fig.height=5, dpi=300}
library(HonestDiD)
library(ggplot2)
df <- haven::read_dta("https://raw.githubusercontent.com/Mixtape-Sessions/Advanced-DID/main/Exercises/Data/ehec_data.dta")
df$treated <- ifelse(is.na(df$yexp2), 0, 1 * (df$year >= df$yexp2))
df$rel_year <- ifelse(is.na(df$yexp2), -100, df$year - df$yexp2)
# Estimate did2s
es_did2s <- did2s(
df,
yname = "dins",
first_stage = ~ 0 | stfips + year,
second_stage = ~ 0 + i(rel_year, ref = -100),
treatment = "treated",
cluster_var = "stfips"
)
iplot(es_did2s, drop = "-100")
```
```{r sensitivity, fig.cap="Sensitivity analysis for the example data", fig.width=8, fig.height=5, dpi=300}
# Relative Magnitude sensitivity analysis
sensitivity_results <- es_did2s |>
# Take fixest obj and convert for `honest_did_did2s`
get_honestdid_obj_did2s(coef_name = "rel_year") |>
# Run sensitivity analysis
honest_did_did2s(
e = 0,
type = "relative_magnitude",
Mbarvec = seq(from = 0.5, to = 2, by = 0.5)
)
# Create plot
HonestDiD::createSensitivityPlot_relativeMagnitudes(
sensitivity_results$robust_ci,
sensitivity_results$orig_ci
)
```
## Event-study plot
```{r, fig.cap="Multiple event-study estimators", fig.width=12, fig.height=8.5, dpi=300}
library(tidyverse)
data(df_het)
df = df_het
multiple_ests = did2s::event_study(
data = df |> mutate(g = ifelse(g == Inf, NA, g)) |> as.data.frame(),
gname = "g",
idname = "unit",
tname = "year",
yname = "dep_var",
estimator = "all"
)
did2s::plot_event_study(multiple_ests)
```
# Citation
If you use this package to produce scientific or commercial publications, please cite according to:
```{r}
citation(package = "did2s")
```
# References