[1438] Adding padding to end_date to avoid duplicate samples #1749

enssow · 2026-01-29T16:17:58Z

Description

TimeWindowHandler doesn't produce enough available forecast initilisation times to choose for samples when run inference on a model trained with $n_{fstep}$ forecast steps and $n_{samples}*dt\geq t_{end} - t_{start}$.
Where $n_{fstep}=$--forecast_steps, $n_{samples}=$--samples, $dt=$--step_hours, $t_{start}=$--start, $t_{end}=$--end
(See #1438 and #1085) for more info

This PR provides this padding by working out how many available individual initialisation times there are and adjusting the end of the time window to accomodate that and taking into account the extra time needed to accomodate the number of forecast steps to rollout to

Issue Number

Closes #1438

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

enssow · 2026-01-29T16:44:56Z

Tested on SANTIS for:

uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5 --options training_config.forecast.num_steps=5
uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5
both now do not return duplication warning and inference_id cixysv6l was run to completion with success

adding padding to end_date to avoid duplicate samples

6c26d0f

github-project-automation bot added this to WeatherGen-dev Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1438] Adding padding to end_date to avoid duplicate samples #1749

[1438] Adding padding to end_date to avoid duplicate samples #1749

Uh oh!

enssow commented Jan 29, 2026 •

edited

Loading

Uh oh!

enssow commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[1438] Adding padding to end_date to avoid duplicate samples #1749

Are you sure you want to change the base?

[1438] Adding padding to end_date to avoid duplicate samples #1749

Uh oh!

Conversation

enssow commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

enssow commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

enssow commented Jan 29, 2026 •

edited

Loading