Seeking Advice on Causal Inference for Treatment Effect Prediction (Small Sample, Genomic Covariates)

Hello, I’ve been studying causal inference recently, but I’m still unsure how to properly approach my analysis — so I would really appreciate your guidance. I’m working with the following dataset and aim to answer this question:

Goal: For each individual, can we predict whether Treatment A or Treatment B would be more effective?

Dataset Summary: N = 88 patients

Treatment assignment: A or B (binary)

Outcome: binary response (1 = favorable response, 0 = unfavorable)

Covariates:

A binary variable for the presence of a specific gene mutation

A continuous variable for the expression level of a specific gene

Questions Since this is a small dataset (n=88), would it still make sense to split the data into training and test sets, as in conventional supervised learning workflows?

I am considering using causal_forest() from the grf package to estimate individual treatment effects (ITEs).

After estimating the ITEs, is it reasonable to decide:

ITE > 0 => Prefer Treatment A

ITE < 0 => Prefer Treatment B

Is this interpretation valid and commonly used in practice?

I’m aware that with such a small sample size, variance and overfitting could be major issues. If there are any recommendations regarding cross-validation strategies, feature regularization, or alternative models (e.g., T-Learner, S-Learner), I’d love to hear them.

Thank you very much in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seeking Advice on Causal Inference for Treatment Effect Prediction (Small Sample, Genomic Covariates) #1490

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seeking Advice on Causal Inference for Treatment Effect Prediction (Small Sample, Genomic Covariates) #1490

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions