-
Notifications
You must be signed in to change notification settings - Fork 45
/
README.Rmd
executable file
·184 lines (134 loc) · 8.87 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
output:
github_document
editor_options:
chunk_output_type: console
---
```{r, echo=FALSE, results='hide', message=FALSE, warning=FALSE, error=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# <img src="man/figures/psa.png" align="right" width="120" align="right" /> Propensity Score Analysis with R
<!-- badges: start -->
[![R-CMD-check](https://github.com/jbryer/psa/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jbryer/psa/actions/workflows/R-CMD-check.yaml)
[![Bookdown Status](https://github.com/jbryer/psa/actions/workflows/bookdown.yaml/badge.svg)](https://github.com/jbryer/psa/actions/workflows/bookdown.yaml)
`r badger::badge_devel("jbryer/psa", "blue")`
`r badger::badge_repostatus("WIP")`
<!-- badges: end -->
Contact: [Jason Bryer](https://www.bryer.org/) ([[email protected]](mailto:[email protected]))
Bookdown Site: https://psa.bryer.org
## Overview
The use of propensity score methods (Rosenbaum & Rubin, 1983) for estimating causal effects in observational studies or certain kinds of quasi-experiments has been increasing over the last two decades. Propensity score analysis (PSA) attempts to adjust selection bias that occurs due to the lack of randomization. Analysis is typically conducted in three phases. In phase I, the probability of placement in the treatment is estimated to identify matched pairs, clusters, or probability weights. In phase II, comparisons on the dependent variable can be made between matched pairs, within clusters, or using inverse probability weights in regression models. In phase III, sensitivity analysis is conducted to estimate how robust the effect sizes estimated in phase II are to unobserved confounders. R (R Core Team, 2012) is ideal for conducting PSA given its wide availability of the most current statistical methods vis-à-vis add-on packages as well as its superior graphics capabilities. This talk will provide participants with a theoretical overview of propensity score methods with an emphasis on graphics. A survey of R packages for conducting PSA with multilevel data, non-binary treatments, and bootstrapping will also be provided. Lastly, a Shiny application to assist with all three phases of PSA will be demonstrated.
```{r psa_citations_by_year, echo=FALSE, fig.width=10, fig.height=4}
data('psa_citations', package = 'psa')
library(ggplot2)
ggplot(psa_citations, aes(x = Year, y = Citations, color = Search_Term)) +
geom_path() +
scale_color_brewer('Search Teram', type = 'qual', palette = 2) +
ggtitle('Number of Citations for Propensity Score Analysis',
subtitle = 'Source: Web of Science and Google Scholar') +
theme_minimal()
```
## Slides
The latest version slides introducing propensity score analysis: [PDF](Slides/Intro_PSA.pdf) or [HTML](http://htmlpreview.github.io/?https://github.com/jbryer/psa/blob/master/Slides/Intro_PSA.html).
<!--
## CUNY MSDS Talk
This is a recording of a talk I gave at CUNY School of Professional Studies on April 24, 2023.
[![Recording of talk given for CUNY MSDS on April 24, 2023.](https://img.youtube.com/vi/Rq_od5KwqEA/maxresdefault.jpg)](https://www.youtube.com/watch?v=Rq_od5KwqEA)
-->
## Getting Started
You can install the `psa` package using the `remotes` package. I recommend setting the `dependencies = 'Enhances'` as many this will install all the packages that are used in the examples.
```{r, eval = FALSE}
remotes::install_github('jbryer/psa', build_vignettes = TRUE, dependencies = 'Enhances')
```
Run the PSA Shiny App:
```{r, eval = FALSE}
psa::psa_shiny()
```
```{r, echo=FALSE}
knitr::include_graphics('man/figures/psa_shiny_screenshots.gif')
```
To explore the PSA visualizations in this package through a simulation, run this Shiny application:
```{r, eval = FALSE}
psa::psa_simulation_shiny()
```
```{r, echo=FALSE}
knitr::include_graphics('man/figures/psa_simulation_screenshot.png')
```
## The `MatchBalance` Function
```{r MatchBalance, warning=FALSE}
data(lalonde, package='Matching')
formu.lalonde <- treat ~ age + I(age^2) + educ + I(educ^2) + hisp + married + nodegr +
re74 + I(re74^2) + re75 + I(re75^2) + u74 + u75
mb0.lalonde <- psa::MatchBalance(df = lalonde, formu=formu.lalonde)
# summary(mb0.lalonde) # Excluded to save space
plot(mb0.lalonde)
```
## The `loess_plot` Function
```{r loess_plot, warning = FALSE, fig.width=10, fig.height=4}
data(lalonde, package = 'Matching')
lr_out <- glm(treat ~ age + I(age^2) + educ + I(educ^2) + black +
hisp + married + nodegr + re74 + I(re74^2) + re75 + I(re75^2) +
u74 + u75,
data = lalonde,
family = binomial(link = 'logit'))
lalonde$ps <- fitted(lr_out)
psa::loess_plot(ps = lalonde$ps,
outcome = log(lalonde$re78 + 1),
treatment = as.logical(lalonde$treat))
```
## The `weighting_plot` Function
```{r weighting_plot, warning = FALSE}
psa::weighting_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = (lalonde$re78))
```
## The `stratification_plot` Function
```{r}
psa::stratification_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = lalonde$re78)
```
## The `matching_plot` Function
```{r matching_plot}
match_out <- Matching::Match(Y = lalonde$re78,
Tr = lalonde$treat,
X = lalonde$ps,
caliper = 0.1,
replace = FALSE,
estimand = 'ATE')
psa::matching_plot(ps = lalonde$ps,
treatment = lalonde$treat,
outcome = log(lalonde$re78 + 1),
index_treated = match_out$index.treated,
index_control = match_out$index.control)
```
## The `merge.mids` Function
The `merge.mids` function is a convenience for merging the multiple imputation results from the `mice::mice()` function with the full data frame used for imputation. In the context of PSA imputation is conducted without the including the outcome variable. This function will merge in the outcome, along with any other variables not used in the imputation procedure, with one of the imputed datasets. Additionally, by setting the `shadow.matrix` parameter to `TRUE` the resulting data frame will contain additional logical columns with the suffix `_missing` with a value of `TRUE` if the variable was originally missing and therefore was imputed.
## R Scripts
The following R scripts will outline how to conduct propensity score analysis.
* [Setup.R](R-Scripts/Setup.R) - Install R packages. This script generally needs to be run once per R installation.
* [IntroPSA.R](R-Scripts/IntroPSA.R) - Conducts propensity score analysis and matching, summarizes results, and evaluates balance using the National Supported Work Demonstration and Current Population Survey (aka lalonde data).
* [IntroPSA-Tutoring.R](R-Scripts/IntroPSA.R) - Conducts propensity score analysis and matching, summarizes results, and evaluates balance using data from a study examining student use of tutoring services in an online introductory writing class (from the `TriMatch` package).
* [Sensitivity.R](R-Scripts/Sensitivity.R) - Conduct a sensitivity analysis.
* [Missingness.R](R-Scripts/Missingness.R) - How to evaluate whether data is missing at random.
* [BootstrappingPSA.R](R-Scripts/BootstrappingPSA.R) - Boostrapping PSA.
* [NonBinaryPSA.R](R-Scripts/NonBinaryPSA.R) - Analysis of three groups (two treatments and one control)
* [MultilevelPSA.R](R-Scripts/MultilevelPSA.R) - Multilevel propensity score analysis.
## R Packages
There are a number of R packages available for conducting propensity score analysis. These are the packages this workshop will make use of:
* [`MatchIt`](http://gking.harvard.edu/gking/matchit) (Ho, Imai, King, & Stuart, 2011) Nonparametric Preprocessing for Parametric Causal Inference
* [`Matching`](http://sekhon.berkeley.edu/matching/) (Sekhon, 2011) Multivariate and Propensity Score Matching Software for Causal Inference
* [`multilevelPSA`](http://jason.bryer.org/multilevelPSA) (Bryer & Pruzek, 2011) Multilevel Propensity Score Analysis
* [`party`](http://cran.r-project.org/web/packages/party/index.html) (Hothorn, Hornik, & Zeileis, 2006) A Laboratory for Recursive Partytioning
* [`PSAboot`](http://jason.bryer.org/PSAboot) (Bryer, 2013) Bootstrapping for Propensity Score Analysis
* [`PSAgraphics`](http://www.jstatsoft.org/v29/i06/paper) (Helmreich & Pruzek, 2009) An R Package to Support Propensity Score Analysis
* [`rbounds`](http://www.personal.psu.edu/ljk20/rbounds%20vignette.pdf) (Keele, 2010) An Overview of rebounds: An R Package for Rosenbaum bounds sensitivity analysis with matched data.
* [`rpart`](http://cran.r-project.org/web/packages/rpart/index.html) (Therneau, Atkinson, & Ripley, 2012) Recursive Partitioning
* [`TriMatch`](http://jason.bryer.org/TriMatch) (Bryer, 2013) Propensity Score Matching for Non-Binary Treatments
## Code of Conduct
Please note that the psa project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.