-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Hi,
I've noticed that some of the generated features exhibit look-ahead bias, which is critical and must be avoided in machine learning regression problems. Specifically, the features in X_train contain exact values that represent the same row in y_train, leading to data leakage?
Example:
In the attached screenshot, you can see that X_train (features) includes values that are present in the same row as y_train. This creates look-ahead bias. Such features (e.g., lags or rolling statistical window features etc.) should be shifted to ensure only available data at the forecasting time is used for prediction.
Questions:
Why does this look-ahead bias exist in the generated features?
Am I using the tool incorrectly?
Is there a specific setting or method I am missing to avoid this issue?
Thank you.