This project implements a method for inference about general treatment effects in instrumental variables settings first proposed in Mogstad, Santos, Torgovitsky 2018 Econometrica (henceforth MST). The main novel results reported here are Monte Carlo simulations corresponding to the data generating process of the paper. In particular, I report extensive simulations corresponding to the identification result reported in Figure 5 of the paper.
The goal of MST is to make inference on parameters that are generally not point-identified in instrumental variable settings. For example, a well-known result is that we can only point-identify the local average treatment effect (LATE) for a given complier subpopulation in a binary-instrument binary-treatment setting. However, the researcher might be interested in the LATE for a different subpopulation or the average treatment effect (ATE).
The general idea of the paper is to set-identify a target parameter, where the identified set is constrained by the estimators we can point-identify. Intuitively, we need to make some assumptions about unobservables to provide a set for the target parameter. However, all parameters that we can point-identify put restrictions on these unobservables. The main contribution of MST is to show that all identified and target parameters can be written as linear maps of so-called marginal treatment response (MTR) functions in a binary choice model. Hence, a combination of data moments and assumptions on MTR functions imply an identified set for the target parameter. For a more detailed introduction to the method see the report in this project.
I originally started working on this project in an econometrics topics course in the 2023 summer term, but back then couldn't get the code to run properly. In particular, similar Monte Carlo studies resulted in estimates exhibiting severe bias due to a faulty implementation. The results reported here now are more plausible given the results in MST. While technically their estimator is only consistent (probably not unbiased) and they don't report any simulations, this paper would probably have not been published if their method was severely biased for any realistic sample size.
All sets in MST (identified or estimated) are implicitly defined by linear programs
(LPs). Thus, the key programming task is to compute the inputs into the linear program.
I then pass these into scipy.optimize.linprog which is the scipy wrapper for several
LP solvers, including highs which I use as the standard. I also implement the copt
solver, which generally performs best for a range of problems (e.g. see these
benchmarks). However, for the small
size of the problems in my simulations I did not see any performance differences (if
anything, scipy has the faster API).
Following MST, I split the code into a section identification and estimation. The
former implements pure identification results for a known DGP, while the latter
implements estimation of the identified set based on data. Both are based on LPs similar
in spirit but with slightly different constraints. In particular, estimation has to
deal with sampling uncertainty since in any finite sample the constraints will only be
satisfied approximately. For details see the report.
To get started, create and activate the environment with
$ conda/mamba env create
$ conda activate pyvmteTo build the project, type
$ pytaskTo reduce runtime it is recommended to use the pytask-parallel plug-in:
$ pytask -n <workers>where workers is the number of workers.
With parallelization the project builds in 5-10 minutes on my machine using 11 workers.
To reduce run-time it is also possible to adjust the simulation settings in
config_mc_by_size.py and config_mc_by_target.py:
MC_SAMPLE_SIZES = [500, 2500, 10000]
MONTE_CARLO_BY_SIZE = MonteCarloSetup(
sample_size=10_000,
repetitions=10_000,
)
MONTE_CARLO_BY_TARGET = MonteCarloSetup(
sample_size=10_000,
repetitions=1_000,
u_hi_range=np.arange(0.35, 1, 0.05),
)Reducing the repetitions always works. Only be careful with really small sample sizes
which can result in errors because estimators might be undefined (the linear programs do
not have a solution).
The original repo I mainly used for development can be found at pyvmte.
This project was created with cookiecutter and the econ-project-templates.