Skip to content

Support scalars in Zarr #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Support scalars in Zarr #3109

wants to merge 3 commits into from

Conversation

mannreis
Copy link
Contributor

@mannreis mannreis commented Mar 11, 2025

Quick patch for #3108.

This PR relies on .zarray having chunk and shape keys that are empty arrays, also checking if .zattrs contains an empty array for _ARRAY_DIMENSIONS. Under this assumptions we can treat the variable as a scalar instead of ignoring as it is being done.

@mannreis mannreis changed the title Quick patch for https://github.com/Unidata/netcdf-c/issues/3108 Quick patch for #3108 Mar 11, 2025
@DWesl
Copy link
Contributor

DWesl commented Mar 11, 2025

Could you indent the contents of the new conditional, and un-dedent the else line and closing curly brace? The current indentation makes parsing the meaning more involved.

}
if(NC_NOERR == (stat = NCJdictget(jvar,"chunks",&jvalue)) && \
jvalue && NCJsort(jvalue) == NCJ_ARRAY && NCJlength(jvalue) == 0 && \
xarray && nclistlength(zvar->xarray) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this xarray be zvar->xarray as in the following function call? I don't see xarray defined here, though the compiler's not complaining, so it must be declared somewhere.

If so, you might be able to make this if an else-if, using the previous `if (zvar->xarray == NULL) condition.

Otherwise, the code looks good, though I'm not familiar with Zarr or netCDF internals. Could you add a test for the desired behavior to ensure this change fixes the problem and that the problem stays fixed?

I'd also appreciate a more descriptive PR title, since I won't remember what issue #3108 was about while on the PR tab. Perhaps "Handle scalars in Zarr" or similar?

Also, what are your thoughts on the comment on line 1598 suggesting Zarr doesn't support scalars, is this still true? Do you think that check should be before this one? Would you still need to set shapes?

Copy link
Contributor Author

@mannreis mannreis Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xarray is declared here:

int xarray = 0;

and is set here:
if(zinfo->controls.flags & FLAG_XARRAYDIMS) {xarray = 1;}

One could also check fot zvar->xarray != NULL but nclistlength(NULL) returns 0, so there's no need.

Adding a test could be nice indeed! Will try to get that in (also contemplating the cases bellow).

Regarding Zarr not supporting scalars. I am not familiar with the specs, @DennisHeimbigner has a much better insight on it. To me it seems that if there is no dimensions for shapes or chunks in dataset.zarr/var/.zarray (or they are 0-length), then it could be a scalar. To get it's value, we check if there's at least one physical chunk on disk (dataset.zarr/var/0) or fallback into the default "FillValue".

Using zarr 2.18.2 here is an example of the 2 cases where "scalars" (length 1 arrays) are stored:

With file chunk With fill value -> no chunk

import zarr
z = zarr.open('z.zarr',mode='w')
z['pi'] = 3.1415
print(open('z.zarr/pi/.zarray','r').read())
print(len(open('z.zarr/pi/0','rb').read()))

import zarr
z = zarr.open('z-fill.zarr',mode='w')
z.ones('pi',shape=())
z.pi.fill_value =3.1415
print(open('z-fill.zarr/pi/.zarray','r').read())
print(len(open('z-fill.zarr/pi/0','rb').read()))

{
    "chunks": [],
    "compressor": null,
    "dtype": "<i8",
    "fill_value": 0,
    "filters": null,
    "order": "C",
    "shape": [],
    "zarr_format": 2
}
8

{
    "chunks": [],
    "compressor": null,
    "dimension_separator": ".",
    "dtype": "<f8",
    "fill_value": 3.1415,
    "filters": null,
    "order": "C",
    "shape": [],
    "zarr_format": 2
}
FileNotFoundError: [Errno 2] No such file or directory: 'z.zarr/pi/0'

For the 2nd case, I think even this patch is not properly filling the scalar value:

$ diff -y <(ncdump file://${PWD}/z.zarr#mode=zarr) <(ncdump file://${PWD}/z-fill.zarr#mode=zarr)
netcdf z {						      |	netcdf z-fill {
variables:							variables:
	double pi ;							double pi ;
		pi:_FillValue = 0. ;			      |			pi:_FillValue = 3.1415 ;
data:								data:

 pi = 3.1415 ;						      |	 pi = _ ;

@mannreis mannreis changed the title Quick patch for #3108 Support scalars in Zarr Mar 14, 2025
WardF
WardF previously approved these changes Mar 19, 2025
@WardF
Copy link
Member

WardF commented Mar 19, 2025

Looks great! I am getting #3086 merged in now that I've figured out the Fortran issue, and after I pull that in I will get this.

@WardF
Copy link
Member

WardF commented Apr 2, 2025

Thanks for your patience; now that netCDF-Fortran v4.6.2 is out, and the docker work for regression testing has been integrated, I'm happy to say we're starting to get the backlog cleared!

@WardF WardF self-assigned this Apr 2, 2025
@mannreis
Copy link
Contributor Author

mannreis commented Apr 3, 2025

Thanks for the updates! Tests are still missing here so I've started to work on it #3118

@WardF
Copy link
Member

WardF commented Apr 16, 2025

Thanks for the updates! Tests are still missing here so I've started to work on it #3118

Sounds good! I've got this pinned for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants