Skip to content

current versions of pyfive (0.5-0.9) assume a v1 btree for chunked data and never check #137

@bnlawrence

Description

@bnlawrence

In DatasetID where we build the chunk index for chunked data:

pyfive/pyfive/h5d.py

Lines 308 to 309 in 29b2b59

chunk_btree = BTreeV1RawDataChunks(
dataobject.fh, dataobject._chunk_address, dataobject._chunk_dims)

We just assume a V1 btree, and do nothing to check it. This is probably a regression that I introduced because we had no data with any other kind of chunk layout.

I've labelled this as a bug, even though we don't yet have an exemplar of this failing in anger, but I can't believe we wont get one soon.

As a bare minimum we need to check what kind of chunk-index is in place (on a per variable basis maybe, even if with V1 and V2 layouts the b-tree is fixed for a file, it is not for V3) and do something sensible (can we use V2 b-trees for data chunks, can we raise an error for something else right now,
or just fail over to metadata only)?

We need a test file with other kinds of chunk layout ... (and there are quite a few, as appendix C suggests). A priority (for me and many of us) would be anything that NetCDF could create, even if is rare.

(Note that the actual way we handle building the chunk index will change in a pull request incoming in the next couple of days which addresses #134 and #135, but at the moment that will still persist this issue.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions