pyvmte

Project

This project implements a method for inference about general treatment effects in instrumental variables settings first proposed in Mogstad, Santos, Torgovitsky 2018 Econometrica (henceforth MST). The main novel results reported here are Monte Carlo simulations corresponding to the data generating process of the paper. In particular, I report extensive simulations corresponding to the identification result reported in Figure 5 of the paper.

The goal of MST is to make inference on parameters that are generally not point-identified in instrumental variable settings. For example, a well-known result is that we can only point-identify the local average treatment effect (LATE) for a given complier subpopulation in a binary-instrument binary-treatment setting. However, the researcher might be interested in the LATE for a different subpopulation or the average treatment effect (ATE).

The general idea of the paper is to set-identify a target parameter, where the identified set is constrained by the estimators we can point-identify. Intuitively, we need to make some assumptions about unobservables to provide a set for the target parameter. However, all parameters that we can point-identify put restrictions on these unobservables. The main contribution of MST is to show that all identified and target parameters can be written as linear maps of so-called marginal treatment response (MTR) functions in a binary choice model. Hence, a combination of data moments and assumptions on MTR functions imply an identified set for the target parameter. For a more detailed introduction to the method see the report in this project.

I originally started working on this project in an econometrics topics course in the 2023 summer term, but back then couldn't get the code to run properly. In particular, similar Monte Carlo studies resulted in estimates exhibiting severe bias due to a faulty implementation. The results reported here now are more plausible given the results in MST. While technically their estimator is only consistent (probably not unbiased) and they don't report any simulations, this paper would probably have not been published if their method was severely biased for any realistic sample size.

Implementation

All sets in MST (identified or estimated) are implicitly defined by linear programs (LPs). Thus, the key programming task is to compute the inputs into the linear program. I then pass these into scipy.optimize.linprog which is the scipy wrapper for several LP solvers, including highs which I use as the standard. I also implement the copt solver, which generally performs best for a range of problems (e.g. see these benchmarks). However, for the small size of the problems in my simulations I did not see any performance differences (if anything, scipy has the faster API).

Following MST, I split the code into a section identification and estimation. The former implements pure identification results for a known DGP, while the latter implements estimation of the identified set based on data. Both are based on LPs similar in spirit but with slightly different constraints. In particular, estimation has to deal with sampling uncertainty since in any finite sample the constraints will only be satisfied approximately. For details see the report.

Usage

To get started, create and activate the environment with

$ conda/mamba env create
$ conda activate pyvmte

To build the project, type

$ pytask

To reduce runtime it is recommended to use the pytask-parallel plug-in:

$ pytask -n <workers>

where workers is the number of workers.

With parallelization the project builds in 5-10 minutes on my machine using 11 workers.

To reduce run-time it is also possible to adjust the simulation settings in config_mc_by_size.py and config_mc_by_target.py:

MC_SAMPLE_SIZES = [500, 2500, 10000]

MONTE_CARLO_BY_SIZE = MonteCarloSetup(
    sample_size=10_000,
    repetitions=10_000,
)

MONTE_CARLO_BY_TARGET = MonteCarloSetup(
    sample_size=10_000,
    repetitions=1_000,
    u_hi_range=np.arange(0.35, 1, 0.05),
)

Reducing the repetitions always works. Only be careful with really small sample sizes which can result in errors because estimators might be undefined (the linear programs do not have a solution).

Credits

The original repo I mainly used for development can be found at pyvmte.

This project was created with cookiecutter and the econ-project-templates.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
inst		inst
paper		paper
src/pyvmte		src/pyvmte
tests		tests
.codespell-ignore		.codespell-ignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint.yml		.yamllint.yml
CHANGES.md		CHANGES.md
CITATION		CITATION
MANIFEST.in		MANIFEST.in
README.md		README.md
architecture.md		architecture.md
environment.yml		environment.yml
linear_program.md		linear_program.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyvmte

Project

Implementation

Usage

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

buddejul/pyvmte

Folders and files

Latest commit

History

Repository files navigation

pyvmte

Project

Implementation

Usage

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages