Look-Ahead Bias in Generated Features

Hi,

I've noticed that some of the generated features exhibit look-ahead bias, which is critical and must be avoided in machine learning regression problems. Specifically, the features in X_train contain exact values that represent the same row in y_train, leading to data leakage?

Example:
In the attached screenshot, you can see that X_train (features) includes values that are present in the same row as y_train. This creates look-ahead bias. Such features (e.g., lags or rolling statistical window features etc.) should be shifted to ensure only available data at the forecasting time is used for prediction.

Questions:

Why does this look-ahead bias exist in the generated features?
Am I using the tool incorrectly?
Is there a specific setting or method I am missing to avoid this issue?

Thank you.

![image](https://github.com/blue-yonder/tsfresh/assets/107035204/36a5ec73-047d-435e-abb0-f2293a8b12b3)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Look-Ahead Bias in Generated Features #1074

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Look-Ahead Bias in Generated Features #1074

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions