Skip to content

Bug in geoarches.evaluation.eval_multistep #50

@ltsabadz

Description

@ltsabadz

Hi, I was running geoarches.evaluation.eval_multistep according to the documentation here. The script automatically sets the device to 'cuda' on line 136 if it is available in the environment and fails with error below.

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 150, in as_variable
    obj = Variable(dims_, data_, *attrs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 380, in __init__
    dims=dims, data=as_compatible_data(data, fastpath=fastpath), attrs=attrs
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 295, in as_compatible_data
    data = np.asarray(data)
           ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_tensor.py", line 1194, in __array__
    return self.numpy()
           ^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/geoarches/geoarches/evaluation/eval_multistep.py", line 301, in <module>
    main()
  File "/workspace/geoarches/geoarches/evaluation/eval_multistep.py", line 270, in main
    labelled_metric_output = metric.compute()
                             ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torchmetrics/metric.py", line 699, in wrapped_func
    value = _squeeze_if_scalar(compute(*args, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/geoarches/geoarches/metrics/metric_base.py", line 153, in compute
    outputs = metric.compute()
              ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torchmetrics/metric.py", line 699, in wrapped_func
    value = _squeeze_if_scalar(compute(*args, **kwargs))
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/geoarches/geoarches/metrics/label_wrapper.py", line 232, in compute
    return self._convert(self.metric.compute())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/geoarches/geoarches/metrics/label_wrapper.py", line 220, in _convert
    ds = xr.Dataset(
         ^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/core/dataset.py", line 389, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
                                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/structure/merge.py", line 1082, in merge_data_and_coords
    return merge_core(
           ^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/structure/merge.py", line 707, in merge_core
    collected = collect_variables_and_indexes(aligned, indexes=indexes)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/structure/merge.py", line 370, in collect_variables_and_indexes
    variable = as_variable(variable, name=name, auto_convert=False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 152, in as_variable
    raise error.__class__(
TypeError: Variable 'rankhist': Could not convert tuple of form (dims, data[, attrs, encoding]): (['prediction_timedelta', 'variable', 'rank'], tensor([[[2.7105, 1.1833, 0.9830,  ..., 0.9851, 1.1592, 2.5510],
         [2.4572, 1.0356, 0.8853,  ..., 1.0872, 1.2597, 2.2862],
         [4.0011, 1.3853, 1.0745,  ..., 0.8121, 0.9961, 2.5702],
         [2.8899, 1.1743, 0.9555,  ..., 1.1012, 1.4019, 3.4411]],
        ...,

        [[2.2738, 1.3107, 1.0814,  ..., 1.1106, 1.3861, 2.6040],
         [2.1716, 1.2771, 1.0447,  ..., 1.1664, 1.4339, 2.5566],
         [2.5498, 1.1100, 0.8741,  ..., 1.2165, 1.7302, 4.7467],
         [2.7417, 1.3088, 1.0228,  ..., 1.1436, 1.5601, 4.1013]]],
       device='cuda:0')) to Variable.

Changing cuda to cpu solved the issue and did not hinder performance significantly, but this might be something you may want to look into because it fails if you run according to the documentation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions