-
Notifications
You must be signed in to change notification settings - Fork 279
Support scalars in Zarr #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support scalars in Zarr #3109
Conversation
Could you indent the contents of the new conditional, and un-dedent the else line and closing curly brace? The current indentation makes parsing the meaning more involved. |
} | ||
if(NC_NOERR == (stat = NCJdictget(jvar,"chunks",&jvalue)) && \ | ||
jvalue && NCJsort(jvalue) == NCJ_ARRAY && NCJlength(jvalue) == 0 && \ | ||
xarray && nclistlength(zvar->xarray) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this xarray
be zvar->xarray
as in the following function call? I don't see xarray
defined here, though the compiler's not complaining, so it must be declared somewhere.
If so, you might be able to make this if an else-if, using the previous `if (zvar->xarray == NULL) condition.
Otherwise, the code looks good, though I'm not familiar with Zarr or netCDF internals. Could you add a test for the desired behavior to ensure this change fixes the problem and that the problem stays fixed?
I'd also appreciate a more descriptive PR title, since I won't remember what issue #3108 was about while on the PR tab. Perhaps "Handle scalars in Zarr" or similar?
Also, what are your thoughts on the comment on line 1598 suggesting Zarr doesn't support scalars, is this still true? Do you think that check should be before this one? Would you still need to set shapes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xarray
is declared here:
Line 1398 in c4f84ee
int xarray = 0; |
and is set here:
Line 1427 in c4f84ee
if(zinfo->controls.flags & FLAG_XARRAYDIMS) {xarray = 1;} |
One could also check fot
zvar->xarray != NULL
but nclistlength(NULL)
returns 0, so there's no need.
Adding a test could be nice indeed! Will try to get that in (also contemplating the cases bellow).
Regarding Zarr not supporting scalars. I am not familiar with the specs, @DennisHeimbigner has a much better insight on it. To me it seems that if there is no dimensions for shapes
or chunks
in dataset.zarr/var/.zarray
(or they are 0-length), then it could be a scalar. To get it's value, we check if there's at least one physical chunk on disk (dataset.zarr/var/0
) or fallback into the default "FillValue".
Using zarr 2.18.2
here is an example of the 2 cases where "scalars" (length 1 arrays) are stored:
With file chunk | With fill value -> no chunk |
---|---|
import zarr
z = zarr.open('z.zarr',mode='w')
z['pi'] = 3.1415
print(open('z.zarr/pi/.zarray','r').read())
print(len(open('z.zarr/pi/0','rb').read())) |
import zarr
z = zarr.open('z-fill.zarr',mode='w')
z.ones('pi',shape=())
z.pi.fill_value =3.1415
print(open('z-fill.zarr/pi/.zarray','r').read())
print(len(open('z-fill.zarr/pi/0','rb').read())) |
{
"chunks": [],
"compressor": null,
"dtype": "<i8",
"fill_value": 0,
"filters": null,
"order": "C",
"shape": [],
"zarr_format": 2
}
8 |
{
"chunks": [],
"compressor": null,
"dimension_separator": ".",
"dtype": "<f8",
"fill_value": 3.1415,
"filters": null,
"order": "C",
"shape": [],
"zarr_format": 2
}
FileNotFoundError: [Errno 2] No such file or directory: 'z.zarr/pi/0' |
For the 2nd case, I think even this patch is not properly filling the scalar value:
$ diff -y <(ncdump file://${PWD}/z.zarr#mode=zarr) <(ncdump file://${PWD}/z-fill.zarr#mode=zarr)
netcdf z { | netcdf z-fill {
variables: variables:
double pi ; double pi ;
pi:_FillValue = 0. ; | pi:_FillValue = 3.1415 ;
data: data:
pi = 3.1415 ; | pi = _ ;
Looks great! I am getting #3086 merged in now that I've figured out the Fortran issue, and after I pull that in I will get this. |
Thanks for your patience; now that netCDF-Fortran v4.6.2 is out, and the |
Thanks for the updates! Tests are still missing here so I've started to work on it #3118 |
Sounds good! I've got this pinned for review. |
Quick patch for #3108.
This PR relies on
.zarray
havingchunk
andshape
keys that are empty arrays, also checking if.zattrs
contains an empty array for_ARRAY_DIMENSIONS
. Under this assumptions we can treat the variable as a scalar instead of ignoring as it is being done.