Skip to content

training data time_idx should be end-align #1857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
workhours opened this issue May 29, 2025 · 3 comments
Open

training data time_idx should be end-align #1857

workhours opened this issue May 29, 2025 · 3 comments

Comments

@workhours
Copy link

when I try a dataset with time series like:
ts1: year from 2001 to 2020 => time_idx=[0,19]
ts2: year from 2010 to 2020 => time_idx=[10,19]
with:
min_prediction_idx=6
min_prediction_length=5
ts2 will be removed from training data since 5+6>len(ts2)
actually ts2 can be used for training e.g. time_idx[10:15] as input and time_idx[15:20] as target, and this case prediction_idx is 15>6
I don't know current implementation is wanting time_idx should be start-align, so:
ts1=> time_idx=[0,19]
ts2=> time_idx=[0,9]
if this is the case, ts2 will expose future information to training model
so at least TimeSeriesDataSet need a configuration time_idx is start align or end align.
most cases time series prediction is handling latest n years prediction, the age of some group of data always less than n

@workhours
Copy link
Author

and if the software support only start-align, then min_prediction_idx will filter most of training data which length less than max of the time series lengths

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 5, 2025

could you post a full piece of code with all imports, and explain:

  • what happens
  • what you think should happen (e.g., "output should be ...")

@kentstone84
Copy link

Add a time_idx_alignment parameter to TimeSeriesDataSet, supporting:

"start" (default, current behavior)

"end" (align prediction window to the end of each series)

"sliding" (allow prediction windows anywhere they fit — most flexible)

This would allow valid windows from shorter time series to contribute to training without leaking future information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants