Comments on better coding or error correction are always welcomed. Contact: [email protected].
Difference in differences (abbreviated as DID, DiD, or DD; I prefer using DID) is nowadays one of the most popular statistical techniques used in quantitative research in social sciences. The main reason for its popularity is that it's "easy" to understand and apply to empirical research. However, after reading a bunch of high-quality econometrics papers published recently (from 2017 to present), I realize that DID is not as easy as I thought before. The main goal of constructing this repository is to share my improved understanding of DID and my Stata coding for running DID. Note that here I only go over a bit details of each DID estimator; please read related papers for greater details.
The table below summarizes various DID estimators and their prerequisites for validity; researchers may use it to search for a proper estimator in a certain setting.
Inventor(s) | Estimator | Parallel Trends | No Anticipation | Effect Homogeneity | Existence of Never-Treated | Balanced Panel |
---|---|---|---|---|---|---|
Unknown | Classical TWFE | ✔️ | ✔️ | ✔️ | ✔️ | |
Arkhangelsky et al. (2022) | SDID | ✔️ | ✔️ | ✔️ | ||
Sun & Abraham (2021) | Interaction Weighted | ✔️ | ✔️ | |||
Callaway & Sant'Anna (2021) | Doubly Robust | ✔️ | ✔️ | |||
de Chaisemartin & D'Haultfœuille |
|
✔️ | ✔️ | |||
Borusyak, Jaravel & Spiess (2023) | Imputation | ✔️ | ✔️ | |||
Wooldridge (2021) | Extended TWFE | ✔️ | ✔️ | |||
Dube et al. (2023) | LP-DID | ✔️ | ✔️ |
Before formally starting, I want to sincerely thank Professor Corina Mommaerts (UW-Madison), Professor Christopher Taber (UW-Madison), Professor Bruce Hansen (UW-Madison), Professor Le Wang (OU), and Professor Myongjin Kim (OU) for lectures and advice during my exploration for DID. I also thank my former PhD colleagues Mieon Seong, Ningjing Gao, and JaeSeok Oh for their one-year support.
Difference in differences, as the name implies, involves comparisons between two groups across two periods. The first difference is between groups and the second difference is always between time. Those units that becomes treated after the treatment time constitute the treated group; the other units constitute the control group. DID typically focuses on identifying and estimating the average treatment effect on the treated (ATT); that is, it measures the average effect of the treatment on those who switch from being untreated to being treated. The dominant approach to implementing DID approach in empirical research is to run a two-way fixed effects (TWFE) regression:
-
$Y_{it}$ is outcome of interest; -
$\alpha_i$ is a unit or individual fixed effect; -
$\gamma_t$ is a time fixed effect; -
$D_{it}$ is a 0/1 indicator for whether or not unit$i$ participates in the treatment in time period$t$ ; -
$\varepsilon_{it}$ are idiosyncratic and time-varying unobservables.
Under parallel trends1, no anticipation, and homogeneous treatment effect assumptions,
Note that in this handbook I focus on absorbing treatment: Once a unit receives a treatment, it cannot get out of the treatment in any future period. Some researchers are trying to extend DID to non-absorbing treatment (also called "switching treatment") design; it is feasible, but under additional assumptions.
The TWFE regression can be easily run in Stata by using command xtreg
, areg
, reghdfe
, or xtdidregress
. Note that xtdidregress
is only available in Stata 17 or higher, and reghdfe
can only be used after we install reghdfe
and ftools
packages. I like using reghdfe
because it has flexible options and computes faster than others. The basic syntax of reghdfe
is:
reghdfe Y D, absorb(id t) cluster(id)
The absorb
option specifies the fixed effects, and the cluster
option specifies at what level the standard errors are clustered.
This format of DID involves only two time periods and two groups; that's why I call it canonical, classical, and textbook format. Canonical means "standard": this is the original and simplest one showing the key rationale behind the DID; classical means "traditional": it has been established for a long time, and sadly it becomes gradually outdated in modern applications; textbook means... it has been introduced in many textbooks, such as Jeffrey Wooldrige's Econometric Analysis of Cross Section and Panel Data (2010) and Bruce Hansen's Econometrics (2022). The TWFE DID can be generalized to multi-period multi-unit cases, only if those treated units get treated in the same period. This kind of treatment is sometimes called block treatment.
Before 2017, many researchers naively believed that the TWFE DID could also be easily generalized to cases where units get treated at different periods (i.e., staggered treatment). Unfortunately, it is not easy! It is definitely not easy, as described in the following.
First, what is static DID? Static DID specifications estimate a single treatment effect that is time invariant. That is, we only get one beta by running a static DID specification, and we use one coefficient to summarize the treatment effect since the policy is implemented. The classical DID is exactly a static DID.
Second, what is Bacon decomposition? This concept comes from Goodman-Bacon (2021). The author proposes and proves the DID decomposition theorem, stating that the TWFE DID estimator (applied to a case with staggered treatment) equals a weighted average of all possible two-group/two-period DID estimators. For emphasizing this finding, the author writes the DID estimator as
The DID decomposition theorem is important because it tells us the existence of a "bad" comparison in the classical DID if we include units treated at multiple time periods --- that is, comparing the late treatment group to the early treatment group before and after the late treatment. It is suggested that we should do the Bacon decomposition when running a static DID specification, by which we can see where our estimate comes from. For example, a negative DID estimate shows up possibly just because a negative result from a heavily weighted bad comparison.
To do the Bacon decomposition in Stata, Andrew Goodman-Bacon (Federal Reserve Bank of Minneapolis), Thomas Goldring (Georgia State University), and Austin Nichols (Amazon) wrote the bacondecomp
package. The basic syntax is:
bacondecomp Y D, ddtail
Y
is outcome variable, D
is treatment dummy, and the ddtail
option is used for more detailed decomposition.
Stata 18 (released on Apr 25, 2023) introduces a new post-estimation command, estat bdecomp
, for performing a Bacon decomposition. It can be used after the didregress
or xtdidregress
command, and a plot can be easily created by adding the graph
option.
Something sad is that the Bacon decomposition in Stata can work well with strongly balanced panel data.
Actually, Bacon decomposition is not the only method of decomposing the DID estimator; another method was proposed by de Chaisemar & D'Haultfœuille (2020). The twowayfeweights
package in Stata, written by Clement de Chaisemartin, Xavier D'Haultfoeuille, and Antoine Deeb (World Bank), allows us to apply the second decomposition method, and fortunately this method can work with unbalanced panel data. However, it is noteworthy that these two decompositions are different in explaining the problem in TWFE DID:
-
Goodman-Bacon (2021) states that the conventional DID estimator can be expressed as a weighted average of all possible
$2 \times 2$ DID estimators, some of which relies on bad comparisons as described above. - de Chaisemar & D'Haultfœuille (2020) states that the conventional DID estimator can be expressed as a weighted average of causal effects (ATEs) across group-time cells, some of which are assigned negative weights (i.e., problem of negative weighting).
Arkhangelsky et al. (2022) propose a method, synthetic difference in differences (SDID), which combines attractive features of both DID and synthetic control (SC) methods.
- Like DID, SDID allows for a constant difference between treatment and controls over all pretreatment periods.
- Like SC, SDID reweights and matches pre-exposure trends to relax the conventional "parallel trend" assumption.
The key step in SDID is to find unit weights (
SDID can be implemented in Stata by the sdid
package (written by Damian Clarke at University of Exeter and Daniel Pailañir at Universidad de Chile). The basic syntax is
sdid depvar groupvar timevar treatment, vce(vcetype)
where
depvar
is the dependent variable;groupvar
is the variable indicating unit, andtimevar
is the variable indicating time periods;treatment
indicates a unit that is treated at/after a specific time period.vce(vcetype)
is a required option specifying what methods used for estimating variance. The available inference methods includebootstrap
,jackknife
,placebo
, andnoinference
. If only one unit is treated, then bootstrap and jackknife methods cannot be used. For using placebo method, we need at least one more control unit than treated unit.
Following the regression, we can use the graph
and g1on
options to create figures (as Figure 1 in Arkhangelsky et al., 2022) displaying unit weights (in scatter plot), time weights (in area plot), and outcome trends (in line plot).
In certain settings, we may want to include some covariates based on which treated and control units will be adjusted. The adjustment procedure is:
- Regress the outcome variable (
$Y$ ) on the covariates ($X$ ) and get coefficient estimates$\hat{\beta}$ . - Calculate the residuals for each observation:
$Y_{it}^{res} = Y_{it} - X_{it}\hat{\beta}$ . - Applying SDID process to the residuals.
This procedure removes the impact of changes in covariates from the outcome variable prior to searching for the synthetic control; this is why it is called a "pre-processing task". The sdid
command gives us the covariates( )
option for doing covariate adjustment conveniently. Note that as of now there are two ways of finding the coefficient estimates
optimized
(default option), implementing an efficient optimizatio procedure from Frank & Wolfe (1956). This method may lead to bias in estimating ATTs in some settings.projected
, proposed by Kranz (2022), running a TWFE regression of the outcome variable on included covariates (using only observation that are never treated). This method, compared to the former one, saves lots of computation time because it uses only one simple regression.
Often, researchers are not satisfied when only a static effect is estimated; they may also want to see the long-term effects of a policy. For example, once Sibal Yang posts a new problem set on Canvas, what is the effect of the problem set on students' happiness on each day before the due date? Anyway, the classical dynamic DID specifications allow for changes in the treatment effects over time. An example of the dynamic DID specification is shown below:
In this format of DID, doing an unbiased estimation becomes much more complicated. A lot of econometricians have tried to solve this problem and they are still trying nowadays. In the following, I will cover most (but not all) brand new heterogeneity-robust DID estimators with their corresponding commands in Stata. Some of the following Stata packages are still under development, so it is best to read their latest documentations before applying them to your research.
Sun & Abraham (2021) formalize the assumptions for identification of coefficients in a static/dynamic TWFE model. Specifically, if treatment effects vary across time (i.e.,
In response to the contamination in the TWFE models, Sun & Abraham (2021) propose an interaction-weighted (IW) estimator. Their estimator improves by estimating an interpretable weighted average of cohort-specific average treatment effect on the treated (CATT). Here, a cohort is defined as a group consisting of all units that were first treated at the same time.
One of the authors, Liyang Sun2 (Center for Monetary and Financial Studies, CEMFI), wrote a Stata package eventstudyinteract
for implementing their IW estimator and constructing confidence interval for the estimation. To use the eventstudyinteract
command, we have to install one more package (in addition to reghdfe
and ftools
): avar
. The basic syntax is below:
eventstudyinteract y rel_time_list, ///
absorb(id t) cohort(variable) control_cohort(variable) vce(vcetype)
Note that we must include a list of relative time indicators as we would have included in the classical dynamic DID regression.
Something sad is that this command is not well compatible with the estout
package; therefore, to report the results in a figure/table, we may have to first store the results in a matrix and then deal with the matrix. See my coding example here.
Callaway & Sant'Anna (2021) pay a particular attention to the disaggregated causal parameter --- the average treatment effect for group
- outcome regression (OR);
- inverse probability weighting (IPW);
- doubly robust (DR).
For estimation, Callaway and Sant'Anna suggest we use the DR approach, because this approach only requires us to correctly specify either (but not necessarily both) the outcome evolution for the comparison group or the propensity score model.
The two authors, with Fernando Rios-Avila (Levy Economics Institute of Bard College), wrote a Stata package csdid
to implement the DID estimator proposed in Callaway & Sant'Anna (2021). Internally, all drdid
command; therefore, to run the csdid
, we have to install two packages: csdid
and drdid
.
The basic syntax is below:
csdid Y covar, ivar(id) time(t) gvar(group)
For running the specification, we need the gvar
variable which equals the first treatment time for the treated, and 0 for the not treated. Note that this command allows us to include covariates into the regression; in some cases the parallel trends assumption holds potentially only after conditioning on observed covariates.
The command has several built-in methods to estimate the coefficient(s); the default is dripw
, i.e., the doubly robust DID estimator based on stabilized inverse probability weighting and ordinary least squares, from Sant'Anna & Zhao (2020). One can use the method( )
option to change it to other available methods. In addition, by default, robust and asymptotic standard errors are estimated. However, other options are available, such as using a multiplicative wild bootstrap procedure by the wboot
option. Enter help csdid
in your Stata command window for learning more details.
Callaway & Sant'Anna (2021) also provide some aggregation schemes to form more aggregated causal parameters. In Stata, we can produce the aggregated estimates by using the post-estimation estat
or csdid_estat
command. The second one is recommended if one uses a bootstrap procedure to estimate the standard errors.
Since 2020, Clément de Chaisemartin (SciencePo) and Xavier D'Haultfœuille (CREST) have written a series of papers to propose different DID estimation techniques. Their major contributions include:
- Their estimators are valid when the treatment effect is heterogenous over time or across groups.
- Their estimators allow for treatment switching (i.e., treatment is allowed to be revoked). Of course, additional assumptions are required; de Chaisemar & D'Haultfœuille (2020) explicitly write out three assumptions about strong exogeneity for
$Y(1)$ , common trends for$Y(1)$ , and existence of stable groups that always get treated in a specific time window. - Their estimators consider discounting the treatments occurring later in the panel data. (This could be evidence that Economics and Statistics can be combined well.)
- They propose several placebo estimators (constructed by mimicking the actual estimators) to test "no anticipation" and "parallel trends" assumptions.
Note that when the not-yet-treated groups are used as controls and there are no control variables in the regression, their estimators are numerically equivalent to the estimators in Callaway & Sant'Anna (2021), introduced in the previous section. Additionally, note that de Chaisemar and D'Haultfœuille use "staggered" to call a treatment design that is not allowed to be revoked. This is very inconsistent with the literature; we usually call this kind of treatment design "absorbing treatment", and we use "staggered treatment" to call a treatment design where timing of adoption differs across groups.
de Chaisemar and D'Haultfœuille wrote a Stata package did_multiplegt
for applying their estimators. The basic syntax is:
did_multiplegt Y G T D
where Y
is the outcome variable, G
is the group variable, T
is the time variable, and D
is the treatment dummy. This command has a lot of options, and here I only introduce several important ones:
- If the
robust_dynamic
option is specified, the estimator proposed in de Chaisemartin & D'Haultfoeuille (2022),$DID_{g,l}$ , will be used; otherwise, the estimator in de Chaisemar & D'Haultfœuille (2020),$DID_M$ , will be used.robust_dynamic
must be specified if we want to estimate dynamic treatment effects. We can use thedynamic(#)
option to specify the number of dynamic effects to be estimated. - When
robust_dynamic
is specified, Stata uses the long-difference placebos proposed in de Chaisemartin & D'Haultfoeuille (2022). We can add thefirstdiff_placebo
option to make Stata use the first-difference placebos proposed in de Chaisemar & D'Haultfœuille (2020). Through theplacebo(#)
option, we can specify the number of placebo estimators to be estimated. The number can be at most equal to the number of time period in our data. - The command estimates the standard errors by using bootstrap. We can use the
cluster( )
option to require Stata to use a block bootstrap at a specific level. Interaction cannot be directly used incluster( )
, but we can generate a interaction term by using thegroup( )
function before running regressions. - The
discount(#)
option allows us to discount the treatment effects estimated later in the panel. - The command by default produces a graph after each regression. By the
graphoptions( )
option, we can modify the appearance of the graph.
The two authors wrote an information-rich documentation (with a long FAQ part) for their package. You can see it by entering help did_multiplegt
in the command window.
Borusyak, Jaravel & Spiess (2023) propose a finite-sample efficient robust DID estimator using a three-step imputation procedure:
- Run a TWFE model using the non-treated (i.e., never-treated and not-yet-treated) observations only. For example, the simplest model is
$Y_{it} = \alpha_i + \gamma_t + e_{it}$ . - Extrapolate the model from step 1 to treated observations, and get non-treated potential outcomes
$\hat{Y}_{it}(0)$ for each treated observation. - Obtain an estimate of the treatment effect
$\hat\beta_{it} = Y_{it} - \hat{Y}_{it}(0)$ for each treated observation. Take averages of estimated treatment effects to get the aggregate ATTs.
This imputation-based method is welcomed because
- it is computationally efficient (it only requires estimating a simple TWFE model);
- the imputation easily links the parallel trends and no anticipation assumptions to the estimator.
One of the authors, Kirill Borusyak (University College London), wrote a Stata package did_imputation
for implementing their imputation approach to estimate the dynamic treatment effects and do pre-trend testing in event studies. The basic syntax is:
did_imputation Y id t Ei, fe(id t) horizons(#) pretrends(#)
The horizons
option tells Stata how many forward horizons of treatment effects we want to estimate, and the pretrends
option tells Stata to perform a pre-trend testing for some periods. The post-treatment coefficients are reported as tau0
, tau1
, ...; the pre-trend coefficients are reported as pre1
, pre2
, .... In contrast with the aforementioned approaches, here the number of pre-trend coefficients does not affect the post-treatment effect estimates, which are always computed under the assumption of parallel trends and no anticipation.
Furthermore, Borusyak, Jaravel & Spiess (2022) is one of the fruitful papers that points out the infamous "negative weighting" problem in the classical DID. This problem arises because the OLS estimation imposes a very strong restriction on treatment effect homogeneity. This is why the classical dynamic DID is called a contaminated estimator by some econometricians.
Wooldridge (2021) claims that "there is nothing inherently wrong with using TWFE in situations such as staggered interventions". Professor Wooldridge proposed an extended TWFE estimator in DID research design (including block and staggered treatments), based on his finding that the traditional TWFE estimator and a two-way Mundlak (TWM) estimator are equivalent.
What is the two-way Mundlak regression? Wooldridge (2021) defines it as a regression of
Based on the findings above, Wooldridge (2021) finds that an unbiased, consistent, and asymptotic efficient estimator for heterogeneous ATTs in DID can be obtained by
- running a TWFE regression with an inclusion of interactions between treatment-time cohorts and time; or equivalently
- running a pooled OLS regression with an inclusion of panel-level averages of covariates.
Amazingly, this estimator allows for heterogenous effects over time, over covariates, or over both.
For example, the traditional TWFE DID regression is
To use this estimation strategy in Stata, Fernando Rios-Avila (Levy Economics Institute of Bard College) first wrote a package named jwdid
. However, the jwdid
command, unlike reghdfe
, does not drop singleton observations automatically, so statistical significance may be biased. Its basic syntax is
jwdid Y, ivar(id) tvar(year) gvar(group)
After the regression, we can use some post-estimation commands to conveniently estimate ATTs for each group or each period and produce plots. For example, estat event, plot
does the dynamic aggregation and returns a plot of estimation results.
Fortunately, Stata 18 introduced a built-in command: xthdidregress
. One of its functions is to implement Wooldridge (2021)'s extended TWFE estimator. The basic syntax is
xthdidregress twfe (Y) (tvar), group(gvar)
In the post-estimation results, only the ATT estimates (for each cohort) at the treatment time and for the periods thereafter are shown; this is because Wooldridge (2021) proves that including time dummies and their related interactions for periods prior to the earliest treatment period doesn't affect the coefficient estimates of interest. Thus, after we use xthdidregress twfe
, it's impossible to draw a figure that presents dynamic effects over the whole time window of our data.
The extended TWFE estimator has a big advantage: it can be obtained from a very basic regression (pooled OLS) so that most researchers can understand it easily. However, a disadvantage is also obvious: its computation could be very intense because it usually includes many interactions and computes a great number of coefficient estimates. In the xthdidregress
command, we can use the hettype( )
option (specifying the type of heterogeneous treatment effects) to alter the default timecohort
to time
or cohort
and then the complexity of computation is reduced.
As Dube et al. (2023) wrote, their local projection approach was motivated by the essential congruity between the concerns of applied microeconomists facing the challenge of estimating dynamic, heterogeneous, staggered treatment effects, and the concerns of applied macroeconomists who have long faced the task of estimating dynamic impulse-responses in time-series or panel data.
The local projection (LP) specification is
Note that
- Unit fixed effects are not included for comparability with the conventional static and dynamic TWFE specifications.
- A different regression is needed for each time horizon
$h$ . - In a setting with an absorbing treatment at time
$s$ ,$\Delta D_{it} = 1$ at$s = t$ , but$\Delta D_{it} = 0$ for$t \neq s$ .
Dube et al. (2023) proved the following equivalence: Under no-anticipation and parallel trends assumptions,
- In a simple
$2\times2$ (i.e., 2-group and 2-period) setting, a static TWFE regression is equivalent to an LP specification at horizon$h = 0$ . - Assuming that treatment effects are homogeneous and that treatment is staggered, a dynamic TWFE regression is equivalent to the following LP regression:
$$y_{i,t+h}-y_{i,t-1} = \delta_t^h + \beta_h^{LP} \Delta D_{it} + \sum_{j=-h, j \neq 0} \theta_j^h \Delta D_{i,t-j} + e_{it}^h$$ - Assuming that treatment effects are heterogenous and that treatment is staggered, the coefficient of interest from an LP regression corresponds to a weighted average dynamic ATT plus two bias terms.
Given the findings above, Dube et al. (2023) provided an LP-DID regression that
- removes previously treated observations and observations treated between
$t+1$ and$t+h$ from the control group; - provides a convex combination of all group-specific effects;
- is robust to possible failure of the homogeneous effects assumption.
To use their approach in Stata, we have to install the lpdid
package written by Alexander Busch (IZA) and Daniele Girardi (University of Massachusetts Amherst), with a required package egenmore
. Then we can use the lpdid
command, whose basic syntax is
lpdid Y, time(time) unit(unit) treat(treat) pre(#) post(#) cluster(clst)
Variable time
indexes time, variable unit
indexes unit, and a dummy treat
indicates the treatment. We can select the length of pre- and post-window of the estimates by the pre(#)
and post(#)
options. Other noteworthy options include:
- Use the
covariates( )
option if we want to include some covariates. - Use
ylags(#)
anddylags(#)
if we want to include lags of the dependent variable and the first-differenced dependent variable as covariates. - By default, the
lpdid
command uses all clean controls (never-treated and not-yet-treated units). Adding thenevertreated
option forces it to only use never-treated units. rw
andweights( )
allow us to reweight observations and then estimate an equally weighted ATT. The default estimate is a variance-weighted ATT. See Sections 3.2 and 3.3 of Dube et al. (2023) for details.bootstrap(#)
andseed(#)
allow us to do wild bootstrap estimation of standard errors.
Potential candidate: Gardner (2021) and Hatamyar et al. (2023).
In this section, I will show how to use the estimators above in empirical analyses, especially by specific commands/packages in Stata.
Here I will use three real-world datasets to show how to run TWFE DID regressions. If available, I will also show the coding for running SDID and then make a comparison.
The data I will use are from three papers:
- "OW05_prop99.dta" is from Orzechowski & Walker (2005), and you can get the recent version from here. Abadie et al. (2010) and Arkhangelsky et al. (2021) use the data to estimate the impact of Proposition 99 (increasing taxes on cigarettes) on sales of cigarettes in packs per capita in California. Note that this is a block treatment case.
- "BCGV22_gender_quota.dta" is from Bhalotra et al. (2022). They use the data to estimate the impact of parliamentary gender quotas (reserving seats for women in parliament) on rates of women in parliament. Note that this is a staggered treatment case.
- "SZ18_state_taxes.dta" is from Serrato & Zidar (2018). They use the data to estimate the impact of state corporate tax cut/hike on tax revenue and economic activities. Note that this is a staggered treatment case; however, Serrato & Zidar (2018) use a dynamic standard DID specification (i.e., without solving the problem of negative weighting) so their results may be biased. Also note that unfortunately the
sdid
command cannot run a dynamic specification so we cannot use SDID to update Serrato & Zidar (2018)'s results.
The regression commands I will use include
xtdidregress
, a built-in command (introduced by Stata 17) for running DID regression on panel data. After using it, we can useestat
to create a trends plot and do some basic tests (if the treatment variables are not continuous).xtreg
,areg
, andreghdfe
are originally written for running fixed effects models, but can also be easily applied to running DID regressions.sdid
, an external command for running SDID. Through themethod( )
option, we can also use it to run standard DID and synthetic control specifications.xthdidregress
, a command introduced in Stata 18 for estimating heterogeneous ATT. Note that thexthdidregress
command allows several kinds of weighting: Thera
,ipw
, andaipw
estimators are from Callaway & Sant'Anna (2021) and thetwfe
estimator is from Wooldridge (2021). I choose to useaipw
(augmented inverse-probability weighting, also called "doubly robust"). After regression, theestat aggregation
command allows us to aggregate the ATTs within cohort or time, analyze the dynamic effect within a specified time window, and create plots.
The complete coding for running regressions on the three datasets can be found here.
The dataset to be used can be loaded into Stata by running the following code:
use "http://pped.org/bacon_example.dta", clear
The panel data contain state-level information (in particular, no-fault divorce onset year and suicide mortality rate) on 49 states (including Washington, D.C. but excluding Alaska and Hawaii) in the US from 1964 to 1996. They are originally used by Stevenson & Wolfers (2006) to estimate the effect of no-fault (or unilateral) divorce on female suicide rate.
Here, I first run a static TWFE DID specification of female suicide (a staggered treatment) on no-fault divorce reforms:
-
$\alpha_s$ is a state fixed effect; -
$\gamma_t$ is a year fixed effect; -
$D_{st}$ is a treatment dummy equaling to 1 if$t$ is greater than or equal to the no-fault divorce onset year and 0 otherwise; -
$X_{st}$ are state-level control variables. - The treatment group consists of the states adopting unilateral divorce laws, while the control group consists of the remaining states.
The estimated coefficients from all the following commands should be identical (but standard errors and R-squared are different due to different algorithms).
xtdidregress (asmrs pcinc asmrh cases) (post), group(stfips) time(year) vce(cluster stfips)
xtreg asmrs post pcinc asmrh cases i.year, fe vce(cluster stfips)
areg asmrs post pcinc asmrh cases i.year, absorb(stfips) vce(cluster stfips)
reghdfe asmrs post pcinc asmrh cases, absorb(stfips year) cluster(stfips)
asmrs
is suicide mortality rate, and post
is treatment dummy
Then we can apply the Bacon decomposition theorem to the TWFE DID model.
bacondecomp asmrs post pcinc asmrh cases, ddetail
It reports that there are 14 timing groups in the dataset, including a never-treated group and an always-treated group. The largest weight is assigned to comparison between always-treated group and timing groups. A scatter plot is here.
We can also use the following coding (after xtdidregress
) to do the decomposition. The scatter plot can be found here.
estat bdecomp, graph
Keep in mind that Bacon decomposition works as a diagnostic tool, instead of a remedy. The decomposition tells us the seriousness of the "bad comparison" in our DID specification, but it cannot cure it.
The next regression I run is a corresponding dynamic DID. I use the eventdd
command because it can run the model and generate a plot at the same time. This command allows for some basic regressions (e.g., xtreg
and reghdfe
); for plotting results from advanced DID regressions, I recommend the event_plot
package (which will be detailed in the next example). To use eventdd
, two packages, eventdd
and matsort
, have to be installed.
Finally, I do a timing placebo test by randomly and repeatedly selecting a placebo treatment time and run TWFE regression for 1000 times. This work is done by using Stata built-in command permute
. The test result shows that my estimate above may not come from an unobservable time trend. A plot (created by the add-in command dpplot
) from the placebo test can be seen here.
Complete coding for this example can be found here.
If a firm exports a product at a price lower than the price it normally sells in its domestic market, then we say this firm is dumping the product. This unfair foreign pricing behavior can adversely distort the business and economy in import markets. Considering the adverse effect, the WTO Anti-Dumping Agreement allows the governments to react to foreign dumping by taking some antidumping (AD) actions. The most typical AD action is imposing higher import duty on the specific product from the specific exporting country, with the hope of raising the unfairly low price to the normal price and thereby mitigating the injury to importing country.
Here, I will employ a dynamic DID specification to estimate the dynamic effects of USA AD duty impositions on China's exporters in a decade from 2000 to 2009. Bown & Crowley (2007) call this effect "trade destruction" and estimate the static effect by running IV, FE, and GMM models with USA and Japanese data. Slides of literature review on this paper can be found here.
The dataset I will use is a product-year-level dataset merged from Global Antidumping Database and China Customs data (thanks to China Data Center at Tsingha University). Then dynamic DID specification I will run is as follows:
-
$Y_{h,t}$ is the outcome variables (including export value, export quantity, number of exporters, and average export quantity) for product$h$ in year$t$ . -
$AD_{h,t-i}^{USA}$ is treatment dummy, equal to 1 if product$h$ received an AD duty from the USA in year$t-i$ . -
$\alpha_h$ is a product fixed effect and$\alpha_t$ is a year fixed effect. - Standard errors are clustered at the product-year level, unless specified otherwise.
- The treatment group is a set of products from China that received the USA AD duties, while the control group is a set of products from China that underwent the AD investigations but finally did not receive the AD duties. Note that I don't include those never-investigated products into control group. The reason is that AD investigations are non-random; the products under investigations always have lower export prices and higher export volumes than those without investigations. If I compare the products receiving AD duties against those without undergoing investigations, then my estimator is very likely to be biased.
Information on the year of an AD duty imposition against a specific product (coded at 6-digit Harmonized System level) is stored in variable year_des_duty
. I use this variable and year
to construct a series of relative time dummies:
gen period_duty = year - year_des_duty
gen Dn3 = (period_duty < -2)
forvalues i = 2(-1)1 {
gen Dn`i' = (period_duty == -`i')
}
forvalues i = 0(1)3 {
gen D`i' = (period_duty == `i')
}
gen D4 = (period_duty >= 4) & (period_duty != .)
Then, it's time to run regressions!
Coding for classical dynamic DID is:
local dep = "value quantity company_num m_quantity"
foreach y in `dep'{
reghdfe ln_`y' Dn3 Dn2 D0-D4, absorb(product year) vce(cluster product#year)
}
Traditionally, researchers use pre-treatment coefficients to test for pretrends: If the pre-treatment coefficients is not significantly different from 0, then they conclude that the parallel trends assumption hold. However, Sun & Abraham (2021) have proved that this action has a serious shortcoming and need correction.
Coding for interaction-weighted estimation in DID is:
gen first_union = year_des_duty
gen never_union = (first_union == .)
local dep = "value quantity company_num m_quantity"
foreach y in `dep'{
eventstudyinteract ln_`y' Dn3 Dn2 D0-D4, ///
cohort(first_union) control_cohort(never_union) ///
absorb(product year) vce(cluster product#year)
}
We need to tell Stata which variable corresponds to the initial treatment timing of each unit. I name it first_union
. This variable should be set to be missing for never treated units. In addition, we need to give Stata a binary variable that corresponds to the control cohort, which can be never-treated units or last-treated units. Here I use never-treated units as the control cohort, and I construct a variable never_union
to indicate it.
Something noteworthy is that a package named event_plot
was written by Kirill Borusyak for easily plotting the staggered DID estimates, including post-treatment coefficients and, if available, pre-trend coefficients, along with confidence intervals. I use this command to create a four-panel figure (see here) showing the dynamic effects on the four outcome variables. For plotting, I usually customize the plot type, but actually you can save a lot of time by using the default type (by using the default_look
option) if your requirement for visualization is not as high as mine.
Coding for doubly robust estimation in DID is:
gen gvar = year_des_duty
recode gvar (. = 0)
local dep = "value quantity company_num m_quantity"
foreach y in `dep'{
quietly csdid ln_`y', ivar(product) time(year) gvar(gvar) ///
method(dripw) wboot(reps(10000)) rseed(1)
csdid_estat event, window(-3 4) estore(cs_`y') wboot(reps(10000)) rseed(1)
}
The cs_did
command may show a very long output table in Stata results window (due to the combination explosion), so I add the quietly
command before csdid
. Besides, as in many applications, I care more about the heterogeneous effects at different points in time but not across different groups (instead of the group-time average treatment effect defined by Callaway & Sant'Anna, 2021); therefore, I use the csdid_estat
to produce the aggregated estimates only at periods from -3 to 4. Now the output table in results window is shorter. Also note that I use the wboot
option to estimate wild bootstrap standard errors, with 10,000 repetitions.
As before, I use the event_plot
command to create a four-panel figure (see here) showing the dynamic effects. This time, I use the default_look
option to save my time; in addition, I use the together
option to make the leads and lags shown as one continuous curve.
Coding for de Chaisemartin & D'Haultfoeuille's estimation in DID is:
egen clst = group(product year)
local ylist = "value quantity m_quantity company_num"
foreach y in `ylist'{
did_multiplegt ln_`y' year_des_duty year treated, ///
robust_dynamic dynamic(4) placebo(2) jointtestplacebo ///
seed(1) breps(100) cluster(clst)
}
Here I use the estimator in de Chaisemartin & D'Haultfoeuille (2022) because I want to estimate dynamic effects. Something noteworthy is that I use the long-difference placebos to do the placebo test; the dynamic effects are estimated using the long-difference DID estimator, so using long-difference placebos is a correct comparison. By contrast, the first-difference placebo estimators are DIDs across consecutive time periods; if firstdiff_placebo
is added here, the graph produced to illustrate dynamic treatment effects will be meaningless (i.e., not comparable). A related discussion can be found here.
I personally dislike the graphs produced automatically by did_multiplegt
, and sadly, the graphoptions( )
option is not flexible as I expect. Therefore, I choose to withdraw and store the estimates in matrices. Then, the event_plot
command can be used to create graphs. The process of creating graphs is very similar to what I did in applying Sun & Abraham (2021)'s estimator. My four-panel figure can be seen here.
Coding for imputation estimation in DID is:
gen Ei = year_des_duty
local ylist = "value quantity m_quantity company_num"
foreach y in `ylist'{
did_imputation ln_`y' product year Ei, fe(product year) cluster(clst) horizons(0/4) pretrends(2) minn(0) autosample
}
We need to give Stata a variable for unit-specific date of treatment, whose missing value represents the never-treated unit. I name it Ei
, following the package documentation.
A four-panel figure presenting the dynamic effects estimated by the imputation approach can be found here. This time, I use the default_look
option but don't use the together
option --- this is why the leads and lags are shown as two separate curves in different colors.
Coding for extended TWFE estimation in DID is:
xtset product year
gen treated = (period_duty < . & period_duty >= 0)
local ylist = "value quantity m_quantity company_num"
foreach y in `ylist'{
xthdidregress twfe (ln_`y') (treated), group(product) vce(cluster clst)
}
As I said before, here we cannot use estat aggregation, dynamic
to plot a figure for presenting dynamic effects over all horizons. Therefore, I use estat aggregation, time
to create four panels (see here) showing the treatment effect at each point in time.
Coding for local projection estimation in DID is
local ylist = "value quantity m_quantity company_num"
foreach y in `ylist'{
lpdid ln_`y', time(year) unit(product) treat(treated) pre(3) post(4) cluster(clst) only_event nograph
}
The only_event
option tells the lpdid
command to only compute the event-study estimates and not compute the pooled estimates; this can increase the computational speed. The nograph
option suppresses the drawing of graph; I add this option because I personally dislike the graph drawn by default. We can use Stata's graph commands to plot flexibly by ourselves, given that all event-study results are stored in the matrix e(results)
by lpdid
. My customized plots of estimation are here.
To summarize, regardless of estimation approaches, the results show persistent and negative effects of USA antidumping duty impositions on the four outcome variables. Complete coding for this example can be found here.
Footnotes
-
See my another repository (Parallel_Trends) for a guideline for testing and relaxing Parallel Trends assumption. ↩
-
Ms. Sun is my second favorite female econometrician, and her papers on estimating treatment effects and using intrumental variables are worth reading. My top favorite female econometrician is Xiaoxia Shi (UW-Madison), although it is too hard for me to understand her advanced research. ↩