Skip to content

Zarr support #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Zarr support #190

wants to merge 39 commits into from

Conversation

keller-mark
Copy link

@keller-mark keller-mark commented Nov 5, 2024

Fixes #91

These changes are from both me and @Artur-man

The main public-facing changes here are:

  • The ZarrAnnData class
  • read_zarr and write_zarr top-level functions
  • Support for from_Seurat(output_class="ZarrAnnData")
  • Support for from_SingleCellExperiment(output_class="ZarrAnnData")

Internally:

  • read_zarr_helpers.R is the zarr analog of read_h5ad_helpers.R
  • write_zarr_helpers.R is the zarr analog of write_h5ad_helpers.R
  • Test fixtures within inst/extdata/example.zarr (this makes the diff noisy, apologies)
  • Lots of tests:
    • test-Zarr-read.R (35 new tests)
    • test-Zarr-write.R (70)
    • test-ZarrAnnData.R (26)
    • test-h5ad-zarr.R (17)

A number of these functions generate warnings in the R console that are intended to be followed up on to improve the code (and should probably be resolved as end users may not appreciate them), but the tests still pass despite these warnings.

Known things that are not implemented here:

  • support for recarrays
  • usage of mode = c("r", "r+", "a", "w", "w-", "x") parameter value

Copy link
Collaborator

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @keller-mark and @Artur-man !

I went through the PR for a first time and left some minor comments. I will review the code by running it a couple of times next :)

attrs <- g$get_attrs()$to_list()

if (!all(c("encoding-type", "encoding-version") %in% names(attrs))) {
path <- name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are a lot of linting issues in this file -- could you run lintr::lint_package() and fix any issues that pop up?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ... did a full lint_package check and corrected some R check issues too!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Artur-man
Copy link

@rcannood @keller-mark should we add pizzarr only under Suggests in DESCRIPTION and no need to add a Remotes list here ?

Suggests:
    ...
    ...
    pizzarr
Remotes:
    keller-mark/pizzarr

@keller-mark
Copy link
Author

Good point, at minimum we should add a note to the Github README here https://github.com/scverse/anndataR/tree/main?tab=readme-ov-file#installation

@Artur-man
Copy link

Artur-man commented Nov 6, 2024

Ah, this is already added here, nevermind then only Suggests should work.
https://github.com/keller-mark/anndataR/blob/d192e68ed312f212476a5490c973619f08e7c8de/R/ZarrAnnData.R#L240-L246.

but will add this to README as well!!

@lazappi
Copy link
Collaborator

lazappi commented Nov 7, 2024

Not a proper review of the code but I wanted to raise that we plan to submit {anndataR} to Bioconductor (or maybe CRAN) at some point (hopefully soonish). This means we won't be able to depend on GitHub only packages. Do you have plans to submit {pizzarr} somewhere or could you rewrite this to use one of the public Zarr packages?

@Artur-man
Copy link

@lazappi thanks for the notification, pizzarr could be submitted to CRAN soon. keller-mark/pizzarr#93

@Artur-man
Copy link

@keller-mark any opinions on this ? perhaps we can make Rarr/pizzarr optional with options() depending on the speed.

Regardless it would be good to have zarr support for AnnDataR. We are currently using reticulate/basilisk env in SpatialData and causes problems already (too many dependancies, multiple env etc.). https://github.com/HelenaLC/SpatialData

That being said, any dependancy should either be in CRAN or BioC. If there are missing utilities in Rarr we can also contact the maintainer and/or send PRs there (I did some, which I can continue too).

@rcannood rcannood added this to the 1.1.0 milestone Mar 10, 2025
@Artur-man
Copy link

Artur-man commented Apr 12, 2025

I have quickly checked if pizzarr utilities could be replaced with https://github.com/grimbough/Rarr, unfortunately there exists a set of limitations to the BioC native package, which

Will be in touch to see if these are resolved in the future, otherwise no zarr R package is currently both in CRAN/BioC and functionally complete yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement zarr backend
4 participants