Skip to content

Calculating Welfare Using DR Scores with a Binary Outcome #1492

@j-kawamu

Description

@j-kawamu

Hello GRF Lab team,

Thank you for developing such a fantastic package. I want to ask about an issue we encountered when using DR scores to calculate welfare with a binary outcome.

Setting:
I'm using a regression discontinuity (RDD) design to study the effect of a health signal, triggered when a diabetes biomarker exceeds a certain threshold, on next year's biomarker levels and mortality. (The average mortality rate in the data is approximately 1%.)

I estimate CATE using lm_forest and compute individual DR scores. I then monetize the DR scores for mortality by multiplying them by the Value of Statistical Life (VSL) to calculate welfare.

Problem:
While the ATE and GATE estimates are reasonable, we encountered an issue with the welfare estimates: the welfare under the status quo policy assignment (e.g., assigning treatment to individuals with biomarker > c regardless of CATE) is extremely large. The welfare is much higher under this scenario than when treatment is assigned only to individuals with both CATE > 0 and biomarker > c. This result seems strange and misleading.

Upon examining the AIPW scores, we found that the DR scores were mostly positive in the treatment group (W = 1), while in the control group (W = 0), the AIPW scores were mostly negative. As a result, not changing any policy assignment leads to a much higher welfare, which is misleading. The following are images of them.

 

This phenomenon only occurs with the binary outcome of mortality—it does not occur when using a continuous outcome like next-year biomarker levels. Also, the results appear more reasonable if we use IPW instead of AIPW to compute welfare.

Question: Is it appropriate to use DR scores to compute welfare when the outcome is binary? If not, can I use alternative approaches or modifications to avoid the abovementioned issue?

As I investigated the cause, I discovered the following:

  • The propensity scores are not extreme.
  • The adjustment term in the AIPW scores dominates the CATE term.
  • This is because the nuisance prediction Y.hat is relatively large compared to the binary outcome (where 99% of the values are 0), resulting in a large adjustment term.
  • Furthermore, the adjustment term changes sign between the treatment (W=1) and control (W=0) groups, causing a large difference in AIPW scores between these groups.

By the way, I compute the DR scores for the LM forest as follows.

lm_get_scores <- function (forest,
                           subset = NULL,
                           debiasing.weights = NULL,
                           num.trees.for.weights = 500, 
                           ...) {
  subset <- grf:::validate_subset(forest, subset)
  W.orig.1 <- forest$W.orig[subset, 1]
  W.hat.1 <- forest$W.hat[subset, 1]
  W.orig.2 <- forest$W.orig[subset, 2]
  W.hat.2 <- forest$W.hat[subset, 2]
  Y.orig <- forest$Y.orig[subset]
  Y.hat <- forest$Y.hat[subset]
  tau.hat.pointwise.1 <- predict(forest)$predictions[subset, 1, ]
  tau.hat.pointwise.2 <- predict(forest)$predictions[subset, 2, ]
  
  Y.residual <- Y.orig - (Y.hat + tau.hat.pointwise.1 * (W.orig.1 - W.hat.1)+ tau.hat.pointwise.2 * (W.orig.2 - W.hat.2))
  tau.hat.pointwise.1 + debiasing.weights * Y.residual
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions