Skip to content

v3.1.0 Delayed Dask Readers

Compare
Choose a tag to compare
@evamaxfield evamaxfield released this 03 Feb 18:33

To support large file reading and parallel image processing, we have converted all image readers over to using dask for data management.

What this means for the user:

  • No breaking changes when using the 3.0.* API (AICSImage.data, AICSImage.get_image_data, imread) all still return full image file reads back as numpy.ndarray.
  • New properties and functions for dask specific handling (AICSImage.dask_data, AICSImage.get_image_dask_data, imread_dask) return delayed dask arrays (dask.array.core.Array)
  • When using either the dask properties and functions, data will not be read until requested. If you want just the first channel of an image AICSImage.get_image_dask_data("STZYX", C=0) will only read and return a five dimensional dask array instead of reading the entire image and then selecting the data down.

A single breaking change:

  • We no longer support handing in file pointers or buffers.

If you want multiple workers to read or process the image, the context manager for AICSImage and all Reader classes now spawns or connects to a Dask cluster and client for the duration of the context manager. If you want to keep it open for longer than a single image, use the context manager exposed from dask_utils.cluster_and_client.

Extras:
napari has been directly added as an "interactive" dependency and if installed, the AICSImage.view_napari function is available for use. This function will launch a napari viewer with some default settings that we find to be good for viewing the data that aicsimageio generally interacts with.