Dask Must Be Delayed
AICSImageIO 3.3.0
We are happy to announce the release of AICSImageIO 3.3.0!
AICSImageIO is a library for delayed parallel image reading, metadata parsing, and image writing for microscopy formats in pure Python. It is built on top of Dask to allow for any size image to act as a normal array as well as allow for distributed reading in parallel on your local machine or an HPC cluster.
Highlights
Non-Dask Functions and Properties are Fully In-Memory
The only change made in this release is to the internal behavior of our API. We found that users were very confused and questioned why certain operations were incredible slow while others were incredibly fast when considering the behaviors together.
Specifically, why did the following:
img = AICSImage("my_file.ome.tiff")
img.data
my_chunk = img.get_image_data("ZYX", C=1) # the actual data we want to retrieve
Complete faster than this:
img = AICSImage("my_file.ome.tiff")
my_chunk = img.get_image_data("ZYX", C=1) # the actual data we want to retrieve
(the difference being: preloading the entire image into memory rather than the get_image_data
function simply using the delayed array)
To resolve this we have made an internal change to the behavior of the library that we will hold consistent moving forward.
If the word dask
is not found in the function or property name when dealing with image data, the entire image will be read into memory in full prior to the function or property completing it's operation.
* In essence this is simply moving that preload into any of the related functions and properties.
The end result is that the user should see much faster read times when using get_image_data
.
If the user was using this function on a too-large-for-memory image, this will result in them having to change over to using get_image_dask_data
and call .compute
on the returned value.
Contributors and Reviewers this Release (alphabetical)
Madison Bowden (@AetherUnbound)
Jackson Maxfield Brown (@JacksonMaxfield)
Jamie Sherman (@heeler)
Dan Toloudis (@toloudis)