Question about `cate_intercept` interpretation when `include_bias=True` and `fit_cate_intercept=False` 🆚 `include_bias=False` and `fit_cate_intercept=True`

## Background
I'm working with a binary treatment variable `T` and a single outcome, and I'm confused about how to interpret the `cate_intercept` in the summary results when `include_bias` is set to `True` vs `False`.

## Issue Description
When I run the same analysis with different `include_bias` settings, the summary results show different coefficients, and I'm unclear about the relationship between `cate_intercept` and the coefficient results.

### Case 1: `include_bias=True`
When `include_bias=True`, I observe that:
- The `cate_intercept` point estimate, confidence interval, and other statistics are **identical** to the coefficient for "1" in the Coefficient Results section
- Both show the same values (e.g., point estimate, CI, etc.)

**Questions:**
1. What is the meaning of `cate_intercept` in this case?
2. What does the "1" coefficient in the Coefficient Results represent?
3. For treatment effect calculation, is it: `1.092 - 2.579 + ... + 1.092`? 
4. Does this mean the `1.092` value gets added twice in the treatment effect calculation?

### Case 2: `include_bias=False`
When `include_bias=False`, I observe that:
- There is **no** "1" coefficient in the Coefficient Results section
- The `cate_intercept` is still reported in the summary

**Questions:**
1. What is the meaning of `cate_intercept` when `include_bias=False`?
2. Is the treatment effect calculated as: sum of all point estimates in Coefficient Results + `cate_intercept` point estimate?

**Any help would be greatly appreciated!**

---

The following is my code

```python
# Treatment effect function
def exp_te(x):
    return np.exp(2 * x[0])# DGP constants

np.random.seed(123)
n = 1000
n_w = 30
support_size = 5
n_x = 4
# Outcome support
support_Y = np.random.choice(range(n_w), size=support_size, replace=False)
coefs_Y = np.random.uniform(0, 1, size=support_size)
def epsilon_sample(n):
    return np.random.uniform(-1, 1, size=n)
# Treatment support
support_T = support_Y
coefs_T = np.random.uniform(0, 1, size=support_size)
def eta_sample(n):
    return np.random.uniform(-1, 1, size=n)

# Generate controls, covariates, treatments and outcomes
W = np.random.normal(0, 1, size=(n, n_w))
X = np.random.uniform(0, 1, size=(n, n_x))
# Heterogeneous treatment effects
TE = np.array([exp_te(x_i) for x_i in X])
# Define treatment
log_odds = np.dot(W[:, support_T], coefs_T) + eta_sample(n)
T_sigmoid = 1/(1 + np.exp(-log_odds))
T = np.array([np.random.binomial(1, p) for p in T_sigmoid])
# Define the outcome
Y = TE * T + np.dot(W[:, support_Y], coefs_Y) + epsilon_sample(n)

# get testing data
X_test = np.random.uniform(0, 1, size=(n, n_x))
X_test[:, 0] = np.linspace(0, 1, n)
```


include_bias=False
```python
est2 = LinearDML(model_y=RandomForestRegressor(),
                       model_t=RandomForestClassifier(min_samples_leaf=10),
                       discrete_treatment=True,
                       featurizer=PolynomialFeatures(degree=2, include_bias=False),
                       cv=6)
est2.fit(Y, T, X=X, W=W)
te_pred2 = est2.effect(X_test)
lb2, ub2 = est2.effect_interval(X_test, alpha=0.01)
est2.summary()
```

<img width="541" height="529" alt="Image" src="https://github.com/user-attachments/assets/41b345cf-5ff3-4d61-8735-90f4df7fd28d" />

include_bias=True
```python
est2 = LinearDML(model_y=RandomForestRegressor(),
                       model_t=RandomForestClassifier(min_samples_leaf=10),
                       discrete_treatment=True,
                       featurizer=PolynomialFeatures(degree=2),
                       cv=6)
est2.fit(Y, T, X=X, W=W)
te_pred2 = est2.effect(X_test)
lb2, ub2 = est2.effect_interval(X_test, alpha=0.01)
est2.summary()
```

<img width="956" height="754" alt="Image" src="https://github.com/user-attachments/assets/dea1429c-843b-483f-b324-adcfd5a50d11" />


---

@2025-07-09 **Updated**: Now I find that the results of include_bias=True and fit_cate_intercept=False are very similar to those of include_bias=True and fit_cate_intercept=True

The difference between the results of "include_bias=True and fit_cate_intercept=False" 🆚 "include_bias=False and fit_cate_intercept=True" doesn't seem to be significant.

Now, an important question is, which setting is more recommendable and worthy of trying?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about `cate_intercept` interpretation when `include_bias=True` and `fit_cate_intercept=False` 🆚 `include_bias=False` and `fit_cate_intercept=True` #986

Background

Issue Description

Case 1: `include_bias=True`

Case 2: `include_bias=False`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about cate_intercept interpretation when include_bias=True and fit_cate_intercept=False 🆚 include_bias=False and fit_cate_intercept=True #986

Description

Background

Issue Description

Case 1: include_bias=True

Case 2: include_bias=False

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Question about `cate_intercept` interpretation when `include_bias=True` and `fit_cate_intercept=False` 🆚 `include_bias=False` and `fit_cate_intercept=True` #986

Case 1: `include_bias=True`

Case 2: `include_bias=False`