Skip to content

Unable to run anemoi inference with ensemble model #851

@timothyas

Description

@timothyas

What happened?

I am unable to run inference using a model trained with the most recent "multi dataset" updates to the anemoi-core and anemoi-inference code updates, even though I'm using a single dataset. The Traceback shows the issue is relevant to anemoi-models:

Traceback (most recent call last):                                                                                            
  File "/global/u2/t/timothys/aneml/anemoi-inference/src/anemoi/inference/runner.py", line 630, in predict_step               
    return model.predict_step({self.checkpoint._metadata.name: input_tensor_torch}, **kwargs)[                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                 
  File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/interface/__init__.py", line 214, in predict_step    
    return self.model.predict_step(**predict_kwargs, **kwargs)                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                
  File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/base.py", line 313, in predict_step           
    y_hat = self.forward(x, model_comm_group=model_comm_group, grid_shard_shapes=grid_shard_shapes, **kwargs)                 
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                 
  File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/ens_encoder_processor_decoder.py", line 181, in forward
    x_data_latent, x_skip, shard_shapes_data = self._assemble_input(                                                          
                                               ^^^^^^^^^^^^^^^^^^^^^                                                          
  File "/global/u2/t/timothys/aneml/anemoi-core/models/src/anemoi/models/models/ens_encoder_processor_decoder.py", line 74, in _assemble_input
    grid_shard_shapes = grid_shard_shapes[dataset_name]                                                                       
                        ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^                                                                       
TypeError: 'NoneType' object is not subscriptable  

Here I'm running inference on a single GPU, so there's no grid sharding and grid_shard_shapes is None. Looks like there needs to be a protective if statement here, as there is in each of the other model classes (e.g. AnemoiModelEncProcDec). I made this modification and everything is working fine.

PR coming very soon...

What are the steps to reproduce the bug?

  • Train an ensemble model with a single dataset, using the most recent versions in anemoi-core
  • Run inference

Version

anemoi-training==0.9.0 anemoi-inference==0.9.0

Platform (OS and architecture)

Linux login28 5.14.21-150500.55.97_13.0.78-cray_shasta_c #1 SMP Thu Mar 13 20:09:44 UTC 2025 (330b47d) x86_64 x86_64 x86_64 GNU/Linux

Relevant log output

Accompanying data

No response

Organisation

NOAA Physical Sciences Laboratory

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Now In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions