Skip to content

Question about cate_intercept interpretation when include_bias=True and fit_cate_intercept=False 🆚 include_bias=False and fit_cate_intercept=True #986

Open
@xjtusjtu

Description

@xjtusjtu

Background

I'm working with a binary treatment variable T and a single outcome, and I'm confused about how to interpret the cate_intercept in the summary results when include_bias is set to True vs False.

Issue Description

When I run the same analysis with different include_bias settings, the summary results show different coefficients, and I'm unclear about the relationship between cate_intercept and the coefficient results.

Case 1: include_bias=True

When include_bias=True, I observe that:

  • The cate_intercept point estimate, confidence interval, and other statistics are identical to the coefficient for "1" in the Coefficient Results section
  • Both show the same values (e.g., point estimate, CI, etc.)

Questions:

  1. What is the meaning of cate_intercept in this case?
  2. What does the "1" coefficient in the Coefficient Results represent?
  3. For treatment effect calculation, is it: 1.092 - 2.579 + ... + 1.092?
  4. Does this mean the 1.092 value gets added twice in the treatment effect calculation?

Case 2: include_bias=False

When include_bias=False, I observe that:

  • There is no "1" coefficient in the Coefficient Results section
  • The cate_intercept is still reported in the summary

Questions:

  1. What is the meaning of cate_intercept when include_bias=False?
  2. Is the treatment effect calculated as: sum of all point estimates in Coefficient Results + cate_intercept point estimate?

Any help would be greatly appreciated!


The following is my code

# Treatment effect function
def exp_te(x):
    return np.exp(2 * x[0])# DGP constants

np.random.seed(123)
n = 1000
n_w = 30
support_size = 5
n_x = 4
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
def epsilon_sample(n):
    return np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
def eta_sample(n):
    return np.random.uniform(-1, 1, size=n)

# Generate controls, covariates, treatments and outcomes
W = np.random.normal(0, 1, size=(n, n_w))
X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# get testing data
X_test = np.random.uniform(0, 1, size=(n, n_x))
X_test[:, 0] = np.linspace(0, 1, n)

include_bias=False

est2 = LinearDML(model_y=RandomForestRegressor(),
                       model_t=RandomForestClassifier(min_samples_leaf=10),
                       discrete_treatment=True,
                       featurizer=PolynomialFeatures(degree=2, include_bias=False),
                       cv=6)
est2.fit(Y, T, X=X, W=W)
te_pred2 = est2.effect(X_test)
lb2, ub2 = est2.effect_interval(X_test, alpha=0.01)
est2.summary()
Image

include_bias=True

est2 = LinearDML(model_y=RandomForestRegressor(),
                       model_t=RandomForestClassifier(min_samples_leaf=10),
                       discrete_treatment=True,
                       featurizer=PolynomialFeatures(degree=2),
                       cv=6)
est2.fit(Y, T, X=X, W=W)
te_pred2 = est2.effect(X_test)
lb2, ub2 = est2.effect_interval(X_test, alpha=0.01)
est2.summary()
Image

@2025-07-09 Updated: Now I find that the results of include_bias=True and fit_cate_intercept=False are very similar to those of include_bias=True and fit_cate_intercept=True

The difference between the results of "include_bias=True and fit_cate_intercept=False" 🆚 "include_bias=False and fit_cate_intercept=True" doesn't seem to be significant.

Now, an important question is, which setting is more recommendable and worthy of trying?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions