-
Notifications
You must be signed in to change notification settings - Fork 51
Labels
bugSomething isn't workingSomething isn't workingdataAnything related to the datasets used in the projectAnything related to the datasets used in the projectdata:readingEverything related to data readingEverything related to data readingmodelRelated to model training or definition (not generic infra)Related to model training or definition (not generic infra)model:inferenceanything related to the inference step (not plotting or score computation).anything related to the inference step (not plotting or score computation).
Description
What happened?
setup
run inference on a model trained with
Where --forecast_steps, --samples, --step_hours, --start, --end
expected result:
No duplicate samples are generated
actual result:
n+1 duplicate samples are generated
Hedgedoc link to logs and more information. This ticket is public, do not attach files directly.
As an example inference can be run on w6khbe9g using:
uv run inference --samples 244 => will contain 4 duplicate samples
For information on the history of this issue and how to avoid triggering the issue refer to #1085
This code can be used to check if the duplicate samples persist:
from pathlib import Path
import numpy as np
from weathergen.common.io import ZarrIO
RUN_ID = "<MY_INFERENCE_RUN_ID>"
results = Path(f"results/{RUN_ID}/validation_chkpt00000_rank0000.zarr")
with ZarrIO(results) as zio:
samples = zio.samples
forecast_step = 0
times = []
for sample in samples:
# one unique datetime per sample
data = zio.get_data(sample, "ERA5", forecast_step).prediction.as_xarray().valid_time
print("|", end="")
times.append(np.unique(data).squeeze())
print()
times = np.array(times)
n_duplication = times.size - np.unique(times).size
print(times.size, np.unique(times).size, n_duplication)
indexes = np.argsort(times)
duplicate_indexes = indexes[:n_duplication*2]
print(f"duplicate indices: {duplicate_indexes}") # randomly distributed
print(f"duplicate times: {times[duplicate_indexes]}")
print(f"min: {times.min()}, max: {times.max()}")Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdataAnything related to the datasets used in the projectAnything related to the datasets used in the projectdata:readingEverything related to data readingEverything related to data readingmodelRelated to model training or definition (not generic infra)Related to model training or definition (not generic infra)model:inferenceanything related to the inference step (not plotting or score computation).anything related to the inference step (not plotting or score computation).
Type
Projects
Status
No status