Skip to content

Conversation

@will-moore
Copy link
Member

@will-moore will-moore commented Aug 19, 2025

As discussed at #474 (comment), we have various issues with parse_url() as it incompletely wraps zarr.open_group().
See #376, #445, #471, #498.

This PR updates docs and tests to avoid using parse_url().
Instead of multiple lines for writing, we can now just do:

write_image(data, path_or_group, axes="zyx")

NB: this is not a breaking change as the group argument can be a path (str) or a group.

For reading, this is currently less concise than with parse_url() since we don't create an itterable Reader() - see discussion below...

@codecov
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 78.12500% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.13%. Comparing base (b49986e) to head (ab6ae7e).
⚠️ Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
ome_zarr/csv.py 16.66% 5 Missing ⚠️
ome_zarr/scale.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #476      +/-   ##
==========================================
+ Coverage   87.02%   87.13%   +0.11%     
==========================================
  Files          13       13              
  Lines        1772     1772              
==========================================
+ Hits         1542     1544       +2     
+ Misses        230      228       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@will-moore will-moore changed the title test_writer() doesn't use parse_url() parse_url_deprecation Aug 25, 2025
@will-moore will-moore marked this pull request as ready for review August 25, 2025 14:07
@joshmoore
Copy link
Member

I wonder if it wouldn't be safer to mark this method deprecated to warn users clearly and then create another method strictly for reading.

@will-moore
Copy link
Member Author

The reading API is what I'm undecided about.
Doing it completely manually doesn't feel too bad (as in the updated docs):

    root = zarr.open_group(url)
    zattrs = root.attrs.asdict()
    # Handle v0.5+ - unwrap 'ome' namespace
    if "ome" in zattrs:
        zattrs = zattrs["ome"]
    paths = [ds["path"] for ds in zattrs["multiscales"][0]["datasets"]]
    dask_data = [da.from_zarr(root[path]) for path in paths]

Whereas this is currently:

    reader = Reader(parse_url(url))
    nodes = list(reader())
    # first node will be the image pixel data
    image_node = nodes[0]
    dask_data = image_node.data

We could do something like:

image = open_ome_zarr(url)
dask_data = image.data
zattrs = image.attrs

And something similar for labels via names.

@will-moore
Copy link
Member Author

With something simple like this:

class Image():
    def __init__(self, group: Group) -> None:
        self.group = group

    @staticmethod
    def get_attrs(group: Group) -> dict:
        zattrs = dict(group.attrs)
        if "ome" in zattrs:
            return zattrs["ome"]
        return zattrs

    @staticmethod
    def matches(group: Group) -> bool:
        return "multiscales" in Spec.get_attrs(group)

    def attrs(self) -> Dict[str, Any]:
        return Spec.get_attrs(self.group)

    def data(self) -> list[da.core.Array]:
        attrs = Spec.get_attrs(self.group)
        paths = [ds["path"] for ds in attrs["multiscales"][0]["datasets"]]
        return [da.from_zarr(self.group[path]) for path in paths]

    def labels(self) -> Dict[str, "Image"]:
        labels: Dict[str, "Image"] = {}
        grp = self.group["labels"]
        attrs = Spec.get_attrs(grp)
        if "labels" in attrs:
            for name in attrs["labels"]:
                g = grp[name]
                if Image.matches(g):
                    labels[name] = Image(g)
        return labels

We can do this...

>>> g = zarr.open_group("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr")
>>> Image.matches(g)
True
>>> i = Image(g)
>>> i.attrs()
{'_creator': {'name': 'omero-zarr', 'version': '0.4.0'}, 'multiscales': [{'axes': [{'name': 'c', 'type': 'channel'}, {'name': 'z', 'type': 'space', 'unit': 'micrometer'}, {'name': 'y', 'type': 'space', 'unit': 'micrometer'}, {'name': 'x', 'type': 'space', 'unit': 'micrometer'}], 'datasets': [{'coordinateTransformations': [{'scale': [1.0, 0.5002025531914894, 0.3603981534640209, 0.3603981534640209], 'type': 'scale'}], 'path': '0'}, {'coordinateTransformations': [{'scale': [1.0, 0.5002025531914894, 0.7207963069280418, 0.7207963069280418], 'type': 'scale'}], 'path': '1'}, {'coordinateTransformations': [{'scale': [1.0, 0.5002025531914894, 1.4415926138560835, 1.4415926138560835], 'type': 'scale'}], 'path': '2'}], 'version': '0.4'}], 'omero': {'channels': [{'active': True, 'coefficient': 1.0, 'color': '0000FF', 'family': 'linear', 'inverted': False, 'label': 'LaminB1', 'window': {'end': 1500.0, 'max': 65535.0, 'min': 0.0, 'start': 0.0}}, {'active': True, 'coefficient': 1.0, 'color': 'FFFF00', 'family': 'linear', 'inverted': False, 'label': 'Dapi', 'window': {'end': 1500.0, 'max': 65535.0, 'min': 0.0, 'start': 0.0}}], 'id': 1, 'rdefs': {'defaultT': 0, 'defaultZ': 118, 'model': 'color'}, 'version': '0.4'}}

>>> i.data()
[dask.array<from-zarr, shape=(2, 236, 275, 271), dtype=uint16, chunksize=(1, 1, 275, 271), chunktype=numpy.ndarray>, dask.array<from-zarr, shape=(2, 236, 137, 135), dtype=uint16, chunksize=(1, 1, 137, 135), chunktype=numpy.ndarray>, dask.array<from-zarr, shape=(2, 236, 68, 67), dtype=uint16, chunksize=(1, 1, 68, 67), chunktype=numpy.ndarray>]

>>> for name, label in i.labels().items():
...     print(name, label.data()[0].shape)
... 
0 (1, 236, 275, 271)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants