Skip to content

Seeking Advice on Causal Inference for Treatment Effect Prediction (Small Sample, Genomic Covariates) #1490

@oghzzang

Description

@oghzzang

Hello, I’ve been studying causal inference recently, but I’m still unsure how to properly approach my analysis — so I would really appreciate your guidance. I’m working with the following dataset and aim to answer this question:

Goal: For each individual, can we predict whether Treatment A or Treatment B would be more effective?

Dataset Summary: N = 88 patients

Treatment assignment: A or B (binary)

Outcome: binary response (1 = favorable response, 0 = unfavorable)

Covariates:

A binary variable for the presence of a specific gene mutation

A continuous variable for the expression level of a specific gene

Questions Since this is a small dataset (n=88), would it still make sense to split the data into training and test sets, as in conventional supervised learning workflows?

I am considering using causal_forest() from the grf package to estimate individual treatment effects (ITEs).

After estimating the ITEs, is it reasonable to decide:

ITE > 0 => Prefer Treatment A

ITE < 0 => Prefer Treatment B

Is this interpretation valid and commonly used in practice?

I’m aware that with such a small sample size, variance and overfitting could be major issues. If there are any recommendations regarding cross-validation strategies, feature regularization, or alternative models (e.g., T-Learner, S-Learner), I’d love to hear them.

Thank you very much in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions