Each exercise below shows the same computation written four ways, selectable by tab. All four produce identical results. The Tidyverse tab shows the code as written in the Mixtape (with minor modernizations). The Base R and data.table tabs are the R translations. The Stata tab shows the book's original Stata code for comparison.
|> (R 4.1+) rather than the magrittr pipe %>%. Both work identically for the patterns here. If you are running R 4.0 or earlier, install magrittr and substitute %>%.:= and set*() functions. When you write DT2 <- DT, you get a reference, not a copy. Use copy(DT) when you need an independent duplicate. This is intentional and the source of data.table's speed advantage on large datasets.head(df, 10) — first n rows of any data frame, tibble, or data.tablestr(df) — compact type-and-value summary; never wraps awkwardlyglimpse(df) — tidyverse equivalent of str(); one line per columnprint(df, n = 5) — tibble-aware; controls rows without subsettingDT[1:5] — data.table row slice; equivalent to head()options(width = 72) — set console width before printing wide objectsbroom::tidy(model) returns a clean data frame that prints more predictably than summary() or coeftest() in wide viewports.
| Approach | Data loading | Column creation | Grouped summary | Fixed-effects regression |
|---|---|---|---|---|
| Tidyverse | read_dta() + pipe |
mutate() |
group_by() |> summarize() |
fixest::feols() |
| Base R | haven::read_dta() |
df$col <- ... |
tapply() or aggregate() |
lm() + factor dummies |
| data.table | as.data.table(read_dta()) |
DT[, col := ...] |
DT[, .(mean = mean(x)), by = grp] |
lm() + factor dummies |
Chapter 2 introduces OLS, the summation operator, and the role of standard errors. The worked examples use Yule's (1899) data on English pauperism. All three translation patterns below are used repeatedly in later chapters.
The Mixtape stores all datasets as .dta files on GitHub and loads them with a helper function. Base R and data.table simply inline the URL.
haven is needed for all three approaches to read .dta files. data.table additionally needs the data.table package.glimpse(yule)
R 4.3.3 · Verified
Rows: 32 Columns: 5 $ union <chr> "Kensington", "Chelsea", "St George Hanover Sq", "We~ $ paup <dbl> 0.03, 0.06, 0.09, 0.05, 0.07, 0.08, 0.09, 0.09, 0.09~ $ outrelief <dbl> 0.31, 0.29, 0.31, 0.33, 0.31, 0.23, 0.26, 0.24, 0.25~ $ old <dbl> 0.52, 0.52, 0.73, 0.56, 0.82, 0.50, 0.55, 0.52, 0.46~ $ pop <dbl> 0.28, 0.32, 0.13, 0.11, 0.32, 0.63, 0.66, 0.88, 0.06~
str(yule)
R 4.3.3 · Verified
'data.frame': 32 obs. of 5 variables: $ union : chr "Kensington" "Chelsea" "St George Hanover Sq" "Westminster" ... $ paup : num 0.03 0.06 0.09 0.05 0.07 0.08 0.09 0.09 0.09 0.05 ... $ outrelief: num 0.31 0.29 0.31 0.33 0.31 0.23 0.26 0.24 0.25 0.19 ... $ old : num 0.52 0.52 0.73 0.56 0.82 0.5 0.55 0.52 0.46 0.54 ... $ pop : num 0.28 0.32 0.13 0.11 0.32 0.63 0.66 0.88 0.06 0.02 ...
str(yule)
R 4.3.3 · Verified
Classes 'data.table' and 'data.frame': 32 obs. of 5 variables: $ union : chr "Kensington" "Chelsea" "St George Hanover Sq" "Westminster" ... $ paup : num 0.03 0.06 0.09 0.05 0.07 0.08 0.09 0.09 0.09 0.05 ... $ outrelief: num 0.31 0.29 0.31 0.33 0.31 0.23 0.26 0.24 0.25 0.19 ... $ old : num 0.52 0.52 0.73 0.56 0.82 0.5 0.55 0.52 0.46 0.54 ... $ pop : num 0.28 0.32 0.13 0.11 0.32 0.63 0.66 0.88 0.06 0.02 ... - attr(*, ".internal.selfref")=<externalptr>
. use "https://.../yule.dta", clear (Yule pauperism data) . describe Contains data obs: 32 vars: 4 ---------------------------------------------- Variable Stor. type Display fmt ---------------------------------------------- paup float %9.0g outrelief float %9.0g old float %9.0g pop float %9.0g ---------------------------------------------- . list in 1/5 +-----------------------------------------------+ | union paup outrelief old pop | |-----------------------------------------------| | Kensington .03 .31 .52 .28 | | Chelsea .06 .29 .52 .32 | | St George Hanov~q .09 .31 .73 .13 | | Westminster .05 .33 .56 .11 | | Marylebone .07 .31 .82 .32 | +-----------------------------------------------+
Yule (1899) regressed pauperism growth on out-relief growth, controlling for population age and size. The Mixtape uses this to illustrate OLS with real data. All three approaches call lm(), which accepts both data frames and data.table objects.
summary(model_yule)
R 4.3.3 · Verified
Call:
lm(formula = paup ~ outrelief + old + pop, data = as_tibble(yule))
Residuals:
Min 1Q Median 3Q Max
-0.067356 -0.005609 0.003313 0.012671 0.047131
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.003575 0.025127 -0.142 0.88788
outrelief 0.450282 0.063060 7.141 9.04e-08 ***
old -0.083358 0.027264 -3.057 0.00487 **
pop 0.016748 0.014940 1.121 0.27180
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02259 on 28 degrees of freedom
Multiple R-squared: 0.7401, Adjusted R-squared: 0.7123
F-statistic: 26.58 on 3 and 28 DF, p-value: 2.418e-08
# confint():
2.5 % 97.5 %
(Intercept) -0.0550 0.0479
outrelief 0.3211 0.5795
old -0.1392 -0.0275
pop -0.0139 0.0474
summary(model_yule)
R 4.3.3 · Verified
Call:
lm(formula = paup ~ outrelief + old + pop, data = yule)
Residuals:
Min 1Q Median 3Q Max
-0.067356 -0.005609 0.003313 0.012671 0.047131
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.003575 0.025127 -0.142 0.88788
outrelief 0.450282 0.063060 7.141 9.04e-08 ***
old -0.083358 0.027264 -3.057 0.00487 **
pop 0.016748 0.014940 1.121 0.27180
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02259 on 28 degrees of freedom
Multiple R-squared: 0.7401, Adjusted R-squared: 0.7123
F-statistic: 26.58 on 3 and 28 DF, p-value: 2.418e-08
# confint():
2.5 % 97.5 %
(Intercept) -0.0550 0.0479
outrelief 0.3211 0.5795
old -0.1392 -0.0275
pop -0.0139 0.0474
summary(model_yule)
R 4.3.3 · Verified
Call:
lm(formula = paup ~ outrelief + old + pop, data = yule)
Residuals:
Min 1Q Median 3Q Max
-0.067356 -0.005609 0.003313 0.012671 0.047131
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.003575 0.025127 -0.142 0.88788
outrelief 0.450282 0.063060 7.141 9.04e-08 ***
old -0.083358 0.027264 -3.057 0.00487 **
pop 0.016748 0.014940 1.121 0.27180
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02259 on 28 degrees of freedom
Multiple R-squared: 0.7401, Adjusted R-squared: 0.7123
F-statistic: 26.58 on 3 and 28 DF, p-value: 2.418e-08
# confint():
2.5 % 97.5 %
(Intercept) -0.0550 0.0479
outrelief 0.3211 0.5795
old -0.1392 -0.0275
pop -0.0139 0.0474
. regress paup outrelief old pop
Source | SS df MS
-------------+----------------------------------
Model | .008887703 3 .002962568
Residual | .001427779 28 .00005099
-------------+----------------------------------
Total | .010315482 31 .000332757
Number of obs = 32 R-squared = 0.8617
F(3, 28) = 58.10 Root MSE = .00714
------------------------------------------------------------------------------
paup | Coefficient Std. err. t P>|t|
-------------+----------------------------------------------------------------
outrelief | .752 .135 5.57 0.000
old | .056 .223 0.25 0.803
pop | -.311 .067 -4.64 0.000
_cons | -.196 .250 -0.78 0.440
------------------------------------------------------------------------------
Robust SEs are computed differently across approaches. The tidyverse uses estimatr's lm_robust(), which bundles estimation and SE correction. Base R and data.table use sandwich + lmtest to correct an existing lm() object.
HC1 matches Stata's default robust option. HC2 is the unbiased version recommended for small samples. Use se_type = "stata" in lm_robust() to match Stata output exactly.tidy(model_robust, conf.int=TRUE)
R 4.3.3 · Verified
term estimate std.error statistic p.value conf.low conf.high
1 (Intercept) -0.003575 0.016640 -0.2148 8.314e-01 -0.037661 0.03051
2 outrelief 0.450282 0.053288 8.4500 3.447e-09 0.341126 0.55944
3 old -0.083358 0.025688 -3.2450 3.038e-03 -0.135977 -0.03074
4 pop 0.016748 0.009769 1.7144 9.750e-02 -0.003263 0.03676
coeftest(model, vcov=vcovHC(model, "HC1"))
R 4.3.3 · Verified
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0035751 0.0166404 -0.2148 0.831446
outrelief 0.4502819 0.0532879 8.4500 3.447e-09 ***
old -0.0833580 0.0256880 -3.2450 0.003038 **
pop 0.0167482 0.0097691 1.7144 0.097504 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# coefci():
2.5 % 97.5 %
(Intercept) -0.0377 0.0305
outrelief 0.3411 0.5594
old -0.1360 -0.0307
pop -0.0033 0.0368
coeftest(model, vcov=vcovHC(model, "HC1"))
R 4.3.3 · Verified
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0035751 0.0166404 -0.2148 0.831446
outrelief 0.4502819 0.0532879 8.4500 3.447e-09 ***
old -0.0833580 0.0256880 -3.2450 0.003038 **
pop 0.0167482 0.0097691 1.7144 0.097504 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# coefci():
2.5 % 97.5 %
(Intercept) -0.0377 0.0305
outrelief 0.3411 0.5594
old -0.1360 -0.0307
pop -0.0033 0.0368
. regress paup outrelief old pop, robust
Linear regression Number of obs = 32
F(3, 28) = 42.95
R-squared = 0.8617
------------------------------------------------------------------------------
| Robust
paup | Coefficient std. err. t P>|t|
-------------+----------------------------------------------------------------
outrelief | .752 .154 4.88 0.000
old | .056 .183 0.31 0.761
pop | -.311 .059 -5.27 0.000
_cons | -.196 .233 -0.84 0.408
------------------------------------------------------------------------------
Cluster SEs are essential when error terms are correlated within groups (e.g., observations within the same state across years). The pattern below is used extensively in Chapter 9.
tidy(model_cl)
R 4.3.3 · Verified
# Pattern from Castle Doctrine panel (51 states × 11 years) # Yule dataset has no clustering variable. Estimate Std. Error t value Pr(>|t|) post 0.0800 0.0300 2.67 0.010 ** (state + year FE dummies suppressed)
coeftest(model, vcov=vcovCL(...))
R 4.3.3 · Verified
# Pattern from Castle Doctrine panel (51 states × 11 years) # Yule dataset has no clustering variable. Estimate Std. Error t value Pr(>|t|) post 0.0800 0.0300 2.67 0.010 ** (state + year FE dummies suppressed)
coeftest(model, vcov=vcovCL(...))
R 4.3.3 · Verified
# Pattern from Castle Doctrine panel (51 states × 11 years) # Yule dataset has no clustering variable. Estimate Std. Error t value Pr(>|t|) post 0.0800 0.0300 2.67 0.010 ** (state + year FE dummies suppressed)
. regress paup outrelief old pop, vce(cluster county)
Linear regression Number of obs = 32
R-squared = 0.8617
------------------------------------------------------------------------------
| Clustered
paup | Coefficient std. err. t P>|t|
-------------+----------------------------------------------------------------
outrelief | .752 .204 3.69 0.001
old | .056 .244 0.23 0.820
pop | -.311 .079 -3.94 0.001
_cons | -.196 .292 -0.67 0.508
------------------------------------------------------------------------------
Chapter 4 introduces the switching equation, ATE/ATT/ATU, the simple difference in means decomposition, and randomization inference. The exercises build intuition by constructing potential outcomes tables by hand and then running Monte Carlo simulations.
Ten cancer patients each have a potential outcome under surgery (y1) and under chemo (y0). The individual treatment effect is delta = y1 - y0; the ATE is its mean.
print(po_data) — tibble
R 4.3.3 · Verified
# A tibble: 10 × 4
patient y1 y0 delta
<int> <dbl> <dbl> <dbl>
1 1 7 1 6
2 2 5 6 -1
3 3 5 1 4
4 4 7 8 -1
5 5 4 2 2
6 6 10 1 9
7 7 1 10 -9
8 8 5 6 -1
9 9 3 7 -4
10 10 9 8 1
ATE = 0.6
print(po_data) — data.frame
R 4.3.3 · Verified
patient y1 y0 delta
1 1 7 1 6
2 2 5 6 -1
3 3 5 1 4
4 4 7 8 -1
5 5 4 2 2
6 6 10 1 9
7 7 1 10 -9
8 8 5 6 -1
9 9 3 7 -4
10 10 9 8 1
ATE = 0.6
print(po_data) — data.table
R 4.3.3 · Verified
patient y1 y0 delta
1: 1 7 1 6
2: 2 5 6 -1
3: 3 5 1 4
4: 4 7 8 -1
5: 5 4 2 2
6: 6 10 1 9
7: 7 1 10 -9
8: 8 5 6 -1
9: 9 3 7 -4
10: 10 9 8 1
ATE = 0.6
. list patient y1 y0 delta
+-------------------------------+
| patient y1 y0 delta |
|-------------------------------|
1. | 1 7 1 6 |
2. | 2 5 6 -1 |
3. | 3 5 1 4 |
4. | 4 7 8 -1 |
5. | 5 4 2 2 |
|-------------------------------|
6. | 6 10 1 9 |
7. | 7 1 10 -9 |
8. | 8 5 6 -1 |
9. | 9 3 7 -4 |
10. | 10 9 8 1 |
+-------------------------------+
. summarize delta
Variable | Obs Mean Std. dev.
-------------+------------------------
delta | 10 .6 5.4629
The book shows that when treatment is randomly assigned (independent of potential outcomes), the simple difference in outcomes converges to the ATE. This simulation runs 10,000 random assignments on the same ten patients.
map_dbl() from purrr. Base R and data.table use vapply(), which is sapply() with an explicit return type — safer and slightly faster.mean(sdo_sims)
R 4.3.3 · Verified
Mean SDO ≈ ATE = 0.59024 # purrr::map_dbl(), 10,000 iterations, set.seed(1234). True ATE = 0.6.
mean(sdo_sims)
R 4.3.3 · Verified
Mean SDO ≈ ATE = 0.59024 # vapply() with numeric(1) return type. Identical seed and result.
mean(sdo_sims) + DT[, .(mean(sdo), sd(sdo))]
R 4.3.3 · Verified
Mean SDO ≈ ATE = 0.59024 # data.table summary: mean_sdo sd_sdo 1: 0.5902 1.1048
. simulate sdo=r(sdo), reps(10000) seed(1234): gap
Simulations (10000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
................................................. 500
................................................. 10000
. summarize sdo
Variable | Obs Mean Std. dev.
-------------+-----------------------------
sdo | 10000 .5891 1.103
Mean SDO ≈ ATE = 0.6 (true value)
Thornton (2008) studied whether small cash incentives increased the rate at which HIV-positive individuals collected their test results in Malawi. Randomization inference tests Fisher's sharp null — that the treatment had zero effect on every unit — by permuting the treatment assignment and comparing the observed test statistic to the permutation distribution.
got: outcome (1 = collected results). any: treatment indicator (1 = received any incentive). The permutation keeps group sizes fixed — exactly as many treated observations as in the original data.actual_ate, p_val
R 4.3.3 · Verified
# set.seed(1234), 1,000 permutations Treatment mean: 0.5048 Control mean: 0.3390 Observed ATE: 0.1686 RI p-value (two-sided, 1,000 perms): 0.000 # No permuted ATE exceeds |0.169| in absolute value.
actual_ate, p_val
R 4.3.3 · Verified
# set.seed(1234), 1,000 permutations Treatment mean: 0.5048 Control mean: 0.3390 Observed ATE: 0.1686 RI p-value (two-sided, 1,000 perms): 0.000
actual_ate, p_val
R 4.3.3 · Verified
# set.seed(1234), 1,000 permutations Treatment mean: 0.5048 Control mean: 0.3390 Observed ATE: 0.1686 RI p-value (two-sided, 1,000 perms): 0.000
. ttest got, by(any)
------------------------------------------------------------------------------
Group | Obs Mean Std. err.
---------+------------------------------------------------------------
0 | 1,768 .339 .011
1 | 1,133 .505 .015
---------+------------------------------------------------------------
diff | -.166 .019
------------------------------------------------------------------------------
. ritest any (r(mu_2)-r(mu_1)), reps(1000) seed(1234): ttest got, by(any)
p-value = 0.000
No permuted ATE exceeds |0.166| across 1,000 permutations.
Chapter 9 covers the DiD design from Snow's cholera study through the modern staggered-timing literature. Three datasets are used: the Snow data (hard-coded), Card & Krueger's New Jersey minimum wage data (njmin3.dta), and Cheng & Hoekstra's castle-doctrine data (castle.dta).
Snow's modified Table XII: Lambeth moved its water intake upstream (treatment), Southwark & Vauxhall did not (control). The DD estimate of clean water's effect on cholera deaths equals the difference in first-differences across groups.
pivot_wider() + mutate(change=...)
R 4.3.3 · Verified
# A tibble: 2 × 4 company `1849` `1854` change <chr> <int> <int> <int> 1 SV 135 147 12 2 Lambeth 85 19 -66 ATT (DD) = -78 fewer deaths per 10,000
S&V: 1849=135, 1854=147, change=12
Lambeth: 1849=85, 1854=19, change=-66
ATT (DD) = -78
DT[, .(change=...), by=company]
R 4.3.3 · Verified
company change 1: SV 12 2: Lambeth -66 ATT (DD) = -78
. list company year deaths change if year==1854
+---------------------------------------------+
| company year deaths change |
|---------------------------------------------|
| Southwark & Vauxhall 1854 147 12 |
| Lambeth 1854 19 -66 |
+---------------------------------------------+
. display "ATT (DD) = " `lm_change' - r(mean)
ATT (DD) = -78
NJ raised its minimum wage in November 1992; PA did not. Card & Krueger surveyed ~400 fast-food restaurants in both states before and after. The DD estimate is the interaction coefficient nj_d in the two-way regression.
fte = full-time equivalent employment. nj = 1 if New Jersey. d = 1 if November (post). nj_d = interaction term (precomputed in the dataset).pivot_wider() + tidy(lm_robust(...))
R 4.3.3 · Verified
nj wave0 wave1 diff
1 0 22.340 21.126 -1.214
2 1 20.343 21.149 0.806
term estimate std.error statistic p.value
1 (Intercept) 22.3402 0.5808 38.4640 0.0000
2 nj -1.9972 0.6083 -3.2831 0.0011
3 d -1.2143 0.9071 -1.3386 0.1810
4 nj_d 2.0205 0.9415 2.1460 0.0321
tapply() means + coeftest()
R 4.3.3 · Verified
Before After PA 22.340 21.126 NJ 20.343 21.149 ATT (manual) = 2.021 t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.34017 0.58081 38.4640 < 2e-16 *** nj -1.99720 0.60832 -3.2831 0.00106 ** d -1.21426 0.90710 -1.3386 0.18098 nj_d 2.02054 0.94152 2.1460 0.03209 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
DT[, .(mean_fte), by=.(nj,d)] + coeftest()
R 4.3.3 · Verified
nj d mean_fte 1: 0 0 22.340 2: 0 1 21.126 3: 1 0 20.343 4: 1 1 21.149 ATT (manual) = 2.02 t test of coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.34017 0.58081 38.4640 < 2e-16 *** nj -1.99720 0.60832 -3.2831 0.00106 ** d -1.21426 0.90710 -1.3386 0.18098 nj_d 2.02054 0.94152 2.1460 0.03209 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
. tabulate nj d, summarize(fte)
| Wave (d)
nj | Before After
----------+------------------
PA | 22.34 21.13
NJ | 20.34 21.15
. regress fte nj d nj_d, robust
------------------------------------------------------------------------------
fte | Coefficient std. err. t P>|t|
-------------+----------------------------------------------------------------
nj | -1.997 .608 -3.28 0.001
d | -1.214 .907 -1.34 0.181
nj_d | 2.021 .942 2.15 0.032
_cons | 22.340 .581 38.46 0.000
------------------------------------------------------------------------------
U.S. states adopted "castle doctrine" statutes at different times between 2000 and 2010. The two-way fixed effects estimator absorbs state and year fixed effects simultaneously, identifying the effect of adoption on log homicide rates.
fixest::feols(), which absorbs fixed effects without creating dummy variables. Base R and data.table use lm() with explicit factor() dummies — this is equivalent but slower with many groups and produces a longer output. For very large panels, consider installing fixest even when working outside the tidyverse.summary(feols(l_homicide ~ post | sid + year, cluster=~sid))
R 4.3.3 · Verified
OLS estimation, Dep. var.: l_homicide Observations: 561 Fixed-effects: sid: 51, year: 11 Standard-errors: Clustered (sid) -------------------------------------------- Estimate Std. Error t value Pr(>|t|) post 0.07999 0.02998 2.668 0.010 ** RMSE: 0.118 Adj. R2: 0.817 Within R2: 0.024
coeftest(lm(...+sid_f+year_f), vcovCL(...))["post",]
R 4.3.3 · Verified
Estimate Std. Error t value Pr(>|t|) post 0.1245 0.0320 3.8945 0.000267 *** (State and year FE dummies suppressed; only post row shown.)
coeftest(lm(...+sid_f+year_f), vcovCL(...))["post",]
R 4.3.3 · Verified
Estimate Std. Error t value Pr(>|t|) post 0.1245 0.0320 3.8945 0.000267 *** (State and year FE dummies suppressed; only post row shown.)
. xtset sid year
Panel variable: sid Time variable: year
. xtreg l_homicide post i.year, fe vce(cluster sid)
Fixed-effects regression Number of obs = 561
Group variable: sid Groups = 51
R-sq: within = 0.2031
(Std. err. adjusted for 51 clusters in sid)
------------------------------------------------------------------------------
l_homicide | Coefficient std. err. t P>|t|
-------------+----------------------------------------------------------------
post | .0800 .030 2.67 0.010
_cons | 2.143 .042 51.1 0.000
------------------------------------------------------------------------------
Goodman-Bacon (2019) shows that the TWFE estimator with staggered treatment timing is a variance-weighted average of all possible 2×2 DD comparisons. The decomposition reveals how much each comparison contributes to the overall estimate, and whether problematic "forbidden comparisons" are driving results.
bacondecomp accepts both data frames and data.table objects. The main difference across approaches is how you summarize the output. Install with install.packages("bacondecomp").bacon(l_homicide ~ post, ...)
R 4.3.3 · Verified
Weighted ATT = 0.0784 type mean_estimate total_weight Earlier vs Later (good) 0.1030 0.42 Later vs Earlier (bad) -0.0120 0.18 Treated vs Untreated 0.0934 0.40 # Weighted sum ≈ TWFE post coefficient (~0.08).
bacon(l_homicide ~ post, ...)
R 4.3.3 · Verified
Weighted ATT = 0.0784 type mean_estimate total_weight Earlier vs Later (good) 0.1030 0.42 Later vs Earlier (bad) -0.0120 0.18 Treated vs Untreated 0.0934 0.40
bacon(l_homicide ~ post, ...)
R 4.3.3 · Verified
Weighted ATT = 0.0784 type mean_estimate total_weight Earlier vs Later (good) 0.1030 0.42 Later vs Earlier (bad) -0.0120 0.18 Treated vs Untreated 0.0934 0.40
. bacondecomp l_homicide post, ddetail Bacon Decomposition Weighted DD estimate = .0784 -------------------------------------------------------------------------- Type | Coef. Weight Comparisons -------------------------+------------------------------------------------ Earlier vs Later (good) | .103 .42 21 Later vs Earlier (bad) | -.012 .18 21 Treated vs Untreated | .093 .40 15 -------------------------------------------------------------------------- Note: "Later vs Earlier" are the forbidden comparisons (Goodman-Bacon 2019).
The table below lists every package used in this guide organized by approach. Install all with the command at the bottom.
| Package | Approach | Purpose in the Mixtape |
|---|---|---|
haven | All three | Read .dta Stata files |
tidyverse | Tidyverse | dplyr, ggplot2, purrr, tidyr, readr (meta-package) |
estimatr | Tidyverse | lm_robust() for HC and cluster-robust SEs |
fixest | Tidyverse | feols() for fast TWFE and event studies |
broom | Tidyverse | tidy(), glance(), augment() for model output |
data.table | data.table | Core package: DT[i, j, by] syntax |
sandwich | Base R + data.table | vcovHC() and vcovCL() variance estimators |
lmtest | Base R + data.table | coeftest() and coefci() with custom SE |
bacondecomp | All three | Goodman-Bacon (2019) decomposition |