Skip to content

Look-Ahead Bias in Generated Features #1074

@Nasser-Alkhulaifi

Description

@Nasser-Alkhulaifi

Hi,

I've noticed that some of the generated features exhibit look-ahead bias, which is critical and must be avoided in machine learning regression problems. Specifically, the features in X_train contain exact values that represent the same row in y_train, leading to data leakage?

Example:
In the attached screenshot, you can see that X_train (features) includes values that are present in the same row as y_train. This creates look-ahead bias. Such features (e.g., lags or rolling statistical window features etc.) should be shifted to ensure only available data at the forecasting time is used for prediction.

Questions:

Why does this look-ahead bias exist in the generated features?
Am I using the tool incorrectly?
Is there a specific setting or method I am missing to avoid this issue?

Thank you.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions