Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 5, 2025

Add ImageStack.from_xarray() method to enable ImageStack construction from xarray DataArrays:

  • Analyze existing ImageStack constructors (from_numpy, from_tileset, etc.)
  • Understand xarray DataArray structure used by ImageStack
  • Create XarrayData class (similar to NumpyData) to implement TileCollectionData interface
  • Implement from_xarray classmethod in ImageStack
  • Add validation for required dimensions and coordinates
  • Create comprehensive tests for the new method (9 tests total)
  • Test concatenation use case mentioned in the issue
  • Run linting and type checking - all pass
  • Address code review feedback:
    • Fixed type conversion logic to use data.copy() to avoid modifying caller's data
    • Added test to verify original DataArray is not modified
    • Confirmed local import pattern follows existing convention
  • Fix mypy type error on line 505 by explicitly typing actual_coords as Set[str]
  • Run security scanning - no vulnerabilities detected

Implementation details:

  • Created new parser module at starfish/core/imagestack/parser/xarray/
  • Added XarrayData and XarrayImageTile classes to handle tile extraction from xarray
  • Added from_xarray() classmethod to ImageStack with comprehensive validation
  • All existing tests pass, new tests cover basic usage, concatenation, coordinate preservation, error handling, and immutability
Original prompt

This section details on the original issue you should resolve

<issue_title>ImageStack from_xarray</issue_title>
<issue_description>Multiple groups have discussed processing image data by round to reduce memory usage, likely converging at the IntensityTable stage. In some cases, these datasets are stored in separate experiments. If we want to enable this mode of SpaceTx data formatting, users identified that it was useful to be able to concatenate ImageStack objects in memory.

This is relatively simple with xarray, but this and other array manipulation tasks are made difficult by our multiprocessing array backing, since there isn't a clear from_xarray constructor.

Adding a from_xarray constructor would simplify concatenation, and more generally, ImageStack construction.</issue_description>

Comments on the Issue (you are @copilot in this section)

@berl @iimog it's fun seeing these issues from back in the day. I think this a reasonable route. @shachafl thoughts?

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Add from_xarray constructor for ImageStack Add ImageStack.from_xarray() constructor for memory-efficient round processing Nov 5, 2025
Copilot finished work on behalf of shachafl November 5, 2025 19:48
Copilot AI requested a review from shachafl November 5, 2025 19:48
Copilot AI requested a review from shachafl November 6, 2025 08:06
Copilot finished work on behalf of shachafl November 6, 2025 08:06
@shachafl
Copy link
Collaborator

@iimog, @berl can you test this Copilot Agent's proposed solution for your specific cases?

@iimog
Copy link
Contributor

iimog commented Nov 11, 2025

I can confirm, that it works as expected for my use case (creating ImageStacks form xarrays I loaded from nc files, see here).

This code works with the starfish version from this pr:

import xarray as xr
from starfish.core.imagestack.imagestack import ImageStack

image_xr = xr.open_dataset("imgs.nc").__xarray_dataarray_variable__
image_stack = ImageStack.from_xarray(image_xr)

# just to check
(image_xr == image_stack.xarray).values.all()
# returns np.True_

@berl
Copy link
Collaborator

berl commented Nov 12, 2025

I like this feature. I don't have any data sitting around, but @iimog's confirmation as current user is worth more anyway. @shachafl before merging this, can you add comments here and on the corresponding issue #2155 noting that the final implementation here is different from the GPT-generated code in the issue? like none of the dask stuff in the issue comment is in this PR

@shachafl shachafl marked this pull request as ready for review November 12, 2025 07:19
@shachafl
Copy link
Collaborator

As @berl mentioned, this PR is modest in scope, and doesn't add dask-backed DataArray functionality (although recommended by Copilot #1285 (comment)).

If warranted by users, we can add in the near future better handling for large-stacks (so you can process without loading everything into memory.):

  1. xarray.to_zarr
  2. Dask-aware constructor - adding a DaskData wrapper (NumpyData analog) so from_xarray can accept dask-backed DataArray directly.

@shachafl shachafl requested a review from berl November 12, 2025 07:36
Copy link
Collaborator

@shachafl shachafl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding was done purely by Copilot agent.
Luckily this builds upon existing from_numpy constructor, so risk is low (without mentioning the extensive unit tests added, also by the agent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ImageStack from_xarray

4 participants