-
Notifications
You must be signed in to change notification settings - Fork 263
Description
Hello GRF Lab team,
Thank you for developing such a fantastic package. I want to ask about an issue we encountered when using DR scores to calculate welfare with a binary outcome.
Setting:
I'm using a regression discontinuity (RDD) design to study the effect of a health signal, triggered when a diabetes biomarker exceeds a certain threshold, on next year's biomarker levels and mortality. (The average mortality rate in the data is approximately 1%.)
I estimate CATE using lm_forest
and compute individual DR scores. I then monetize the DR scores for mortality by multiplying them by the Value of Statistical Life (VSL) to calculate welfare.
Problem:
While the ATE and GATE estimates are reasonable, we encountered an issue with the welfare estimates: the welfare under the status quo policy assignment (e.g., assigning treatment to individuals with biomarker > c regardless of CATE) is extremely large. The welfare is much higher under this scenario than when treatment is assigned only to individuals with both CATE > 0 and biomarker > c. This result seems strange and misleading.
Upon examining the AIPW scores, we found that the DR scores were mostly positive in the treatment group (W = 1), while in the control group (W = 0), the AIPW scores were mostly negative. As a result, not changing any policy assignment leads to a much higher welfare, which is misleading. The following are images of them.
This phenomenon only occurs with the binary outcome of mortality—it does not occur when using a continuous outcome like next-year biomarker levels. Also, the results appear more reasonable if we use IPW instead of AIPW to compute welfare.
Question: Is it appropriate to use DR scores to compute welfare when the outcome is binary? If not, can I use alternative approaches or modifications to avoid the abovementioned issue?
As I investigated the cause, I discovered the following:
- The propensity scores are not extreme.
- The adjustment term in the AIPW scores dominates the CATE term.
- This is because the nuisance prediction
Y.hat
is relatively large compared to the binary outcome (where 99% of the values are 0), resulting in a large adjustment term. - Furthermore, the adjustment term changes sign between the treatment (W=1) and control (W=0) groups, causing a large difference in AIPW scores between these groups.
By the way, I compute the DR scores for the LM forest as follows.
lm_get_scores <- function (forest,
subset = NULL,
debiasing.weights = NULL,
num.trees.for.weights = 500,
...) {
subset <- grf:::validate_subset(forest, subset)
W.orig.1 <- forest$W.orig[subset, 1]
W.hat.1 <- forest$W.hat[subset, 1]
W.orig.2 <- forest$W.orig[subset, 2]
W.hat.2 <- forest$W.hat[subset, 2]
Y.orig <- forest$Y.orig[subset]
Y.hat <- forest$Y.hat[subset]
tau.hat.pointwise.1 <- predict(forest)$predictions[subset, 1, ]
tau.hat.pointwise.2 <- predict(forest)$predictions[subset, 2, ]
Y.residual <- Y.orig - (Y.hat + tau.hat.pointwise.1 * (W.orig.1 - W.hat.1)+ tau.hat.pointwise.2 * (W.orig.2 - W.hat.2))
tau.hat.pointwise.1 + debiasing.weights * Y.residual
}