Skip to content

Iterable[Dataset | DataArray] issue with xarray.core.combine.combine_by_coords #10114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leaver2000 opened this issue Mar 11, 2025 · 3 comments
Labels
topic-combine combine/concat/merge topic-typing

Comments

@leaver2000
Copy link

What is your issue?

I wanted to highlight a typing issue with xarray.core.combine.combine_by_coords where the first argument is data_objects: Iterable[Dataset | DataArray] = []. Specifically the type issue exists when a Generator which is an Iterable is passed as the data_objects argument. See the example below...

from typing import Iterable
import xarray as xr

def dataset_maker(key:str) -> xr.Dataset:
    return xr.Dataset(
        data_vars={
            key: xr.DataArray(data=[1, 2, 3], dims=["x"])
        },
        coords=dict(
            x=xr.DataArray(data=[1, 2, 3], dims=["x"]),
        ),
    )


data_list = [dataset_maker('a'), dataset_maker('b')]
assert isinstance(data_list, Iterable)
print("... { data_list } ...")
print(xr.combine_by_coords(data_list))

data_tuple = dataset_maker('a'), dataset_maker('b')
assert isinstance(data_tuple, Iterable)
print("... { data_tuple } ...")
print(xr.combine_by_coords(data_tuple))

data_iterator = (dataset_maker(k) for k in ['a', 'b']) 
assert isinstance(data_iterator, Iterable)
print("... { data_iterator } ...")
print(xr.combine_by_coords(data_iterator))

The results.

... { data_list } ...
<xarray.Dataset> Size: 72B
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 24B 1 2 3
Data variables:
    a        (x) int64 24B 1 2 3
    b        (x) int64 24B 1 2 3
... { data_tuple } ...
<xarray.Dataset> Size: 72B
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 24B 1 2 3
Data variables:
    a        (x) int64 24B 1 2 3
    b        (x) int64 24B 1 2 3
... { data_iterator } ...
<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

The issue arises at the head of the function where the generator is exhausted.

    objs_are_unnamed_dataarrays = [
        isinstance(data_object, DataArray) and data_object.name is None
        for data_object in data_objects
    ]

A simple solution would be to change the type annotations to either that of a Sequence or memoize the data_objects variable as...

data_objects = list(data_objects)
@leaver2000 leaver2000 added the needs triage Issue that has not been reviewed by xarray team member label Mar 11, 2025
Copy link

welcome bot commented Mar 11, 2025

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

That makes sense — thanks for the issue.

I think materializing it as a list would make sense. Or if it's possible to move the check closer to the code that's operating and avoiding the additional loop, that would work too.

Any chance you'd like to put a small PR in?

@leaver2000
Copy link
Author

Any chance you'd like to put a small PR in?

Sure, I'll try and get something submitted tomorrow.

@TomNicholas TomNicholas added topic-combine combine/concat/merge topic-typing and removed needs triage Issue that has not been reviewed by xarray team member labels Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-combine combine/concat/merge topic-typing
Projects
None yet
Development

No branches or pull requests

3 participants