Skip to content

Conversation

@enssow
Copy link
Contributor

@enssow enssow commented Jan 29, 2026

Description

TimeWindowHandler doesn't produce enough available forecast initilisation times to choose for samples when run inference on a model trained with $n_{fstep}$ forecast steps and $n_{samples}*dt\geq t_{end} - t_{start}$.
Where $n_{fstep}=$--forecast_steps, $n_{samples}=$--samples, $dt=$--step_hours, $t_{start}=$--start, $t_{end}=$--end
(See #1438 and #1085) for more info

This PR provides this padding by working out how many available individual initialisation times there are and adjusting the end of the time window to accomodate that and taking into account the extra time needed to accomodate the number of forecast steps to rollout to

Issue Number

Closes #1438

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@enssow
Copy link
Contributor Author

enssow commented Jan 29, 2026

Tested on SANTIS for:

  • uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5 --options training_config.forecast.num_steps=5
  • uv run inference --from-run-id f4duf5ji --samples 254 --streams-output ERA5
    both now do not return duplication warning and inference_id cixysv6l was run to completion with success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Duplicate samples during inference due to different length assumtions in MultiStreamDataReader

1 participant