You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can we make dask an optional dependency of rechunker?
IIUC conceptually rechunker doesn't explicitly require dask - rechunking plans can be executed with a variety of executors.
This has been discussed in #85 and #87 but there still seem to be hard imports of both dask and dask.array.
Context: In pydata/xarray#7019 I'm working on generalizing xarray to wrap other chunked parallel arrays, in particular cubed. We would like to be able to test xarray's handling of chunked arrays via cubed without ever importing dask. @tomwhitewas able to remove the explicit dependency of cubed on dask, but cubed fundamentally depends on rechunker, which currently depends explicitly on dask. I think cubed only imports the parts of rechunker that don't explicitly depend on dask (i.e. the executors, see #87), but at the moment dask will still be installed when rechunker is installed.
Looking at the code, it seems that the dask dependence is already mostly compartmentalized into executors/dask.py. Dask is also imported in api.py, but it's mostly for isinstance() checks against dask.array.Array. We could handle those in a similar way to how xarray handles checking array types against optionally imported modules.
The only imports I'm not immediately sure how to handle are:
A call to dask.array.asarray when an xarray dataset is detected, which seems like an unnecessarily hard-coding to me now?
Can we make dask an optional dependency of rechunker?
IIUC conceptually rechunker doesn't explicitly require dask - rechunking plans can be executed with a variety of executors.
This has been discussed in #85 and #87 but there still seem to be hard imports of both
dask
anddask.array
.Context: In pydata/xarray#7019 I'm working on generalizing xarray to wrap other chunked parallel arrays, in particular cubed. We would like to be able to test xarray's handling of chunked arrays via cubed without ever importing dask. @tomwhite was able to remove the explicit dependency of cubed on dask, but cubed fundamentally depends on rechunker, which currently depends explicitly on dask. I think cubed only imports the parts of rechunker that don't explicitly depend on dask (i.e. the executors, see #87), but at the moment dask will still be installed when rechunker is installed.
Looking at the code, it seems that the dask dependence is already mostly compartmentalized into
executors/dask.py
. Dask is also imported inapi.py
, but it's mostly forisinstance()
checks againstdask.array.Array
. We could handle those in a similar way to how xarray handles checking array types against optionally imported modules.The only imports I'm not immediately sure how to handle are:
dask.array.asarray
when an xarray dataset is detected, which seems like an unnecessarily hard-coding to me now?rechunker/rechunker/api.py
Line 446 in d625b6d
dask.utils.parse_bytes
, but that could be vendored like Tom vendored it within cubedrechunker/rechunker/api.py
Line 547 in d625b6d
cc @rabernat
The text was updated successfully, but these errors were encountered: