-
Notifications
You must be signed in to change notification settings - Fork 78
Description
What happened?
I am unable to run inference using a model trained with the most recent "multi dataset" updates to the anemoi-core and anemoi-inference code updates, even though I'm using a single dataset. The Traceback shows the issue is relevant to anemoi-models:
Traceback (most recent call last):
File "/global/u2/t/timothys/aneml/anemoi-inference/src/anemoi/inference/runner.py", line 630, in predict_step
return model.predict_step({self.checkpoint._metadata.name: input_tensor_torch}, **kwargs)[
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/interface/__init__.py", line 214, in predict_step
return self.model.predict_step(**predict_kwargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/base.py", line 313, in predict_step
y_hat = self.forward(x, model_comm_group=model_comm_group, grid_shard_shapes=grid_shard_shapes, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/ens_encoder_processor_decoder.py", line 181, in forward
x_data_latent, x_skip, shard_shapes_data = self._assemble_input(
^^^^^^^^^^^^^^^^^^^^^
File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/ens_encoder_processor_decoder.py", line 74, in _assemble_input
grid_shard_shapes = grid_shard_shapes[dataset_name]
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable Here I'm running inference on a single GPU, so there's no grid sharding and grid_shard_shapes is None. Looks like there needs to be a protective if statement here, as there is in each of the other model classes (e.g. AnemoiModelEncProcDec). I made this modification and everything is working fine.
PR coming very soon...
What are the steps to reproduce the bug?
- Train an ensemble model with a single dataset, using the most recent versions in anemoi-core
- Run inference
Version
anemoi-training==0.9.0 anemoi-inference==0.9.0
Platform (OS and architecture)
Linux login28 5.14.21-150500.55.97_13.0.78-cray_shasta_c #1 SMP Thu Mar 13 20:09:44 UTC 2025 (330b47d) x86_64 x86_64 x86_64 GNU/Linux
Relevant log output
Accompanying data
No response
Organisation
NOAA Physical Sciences Laboratory
Metadata
Metadata
Assignees
Labels
Type
Projects
Status