Skip to content

Conversation

@bnlawrence
Copy link
Collaborator

@bnlawrence bnlawrence commented Oct 27, 2025

Description

This pull request addresses three specific issues:

  • Closes Over agressive optimisation of b-tree reading. #135: what used to happen was that the b-tree had to be loaded as soon as you wanted to inspect any dataset attributes. This is still the default behaviour, but now there is an additional option to get a lazy view of the variable which doesn't load the b-tree, this lazy support is propagated into visititems as well.
  • Closes Exposing b-tree layout information #130: the issue here was that we want to make information about the b-tree location available for inspection (to help diagnose performance problems when accessing files). This is a simple fix (new attribute on DatsetID).
  • Closes A pyfive "ncdump" #134: Introducing a p5dump so that we can inspect a file without any c-code (and potentially inspect many files in parallel).

@zequihg50 Could you please have a look at this one too? Especially the b-tree range stuff and the related tests.

Checklist

  • This pull request has a descriptive title and labels
  • This pull request has a minimal description (most was discussed in the issue, but a two-liner description is still desirable)
  • Unit tests have been added (if codecov test fails)
  • Any changed dependencies have been added or removed correctly (if need be)
  • If you are working on the documentation, please ensure the current build passes
  • All tests pass

@bnlawrence
Copy link
Collaborator Author

I have not added any documentation yet, I figured I'd do that once we had agreed the p5dump API and functionality, and agreed on the btree range.

@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 78.92157% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.10%. Comparing base (a2fa21b) to head (3040c3d).

Files with missing lines Patch % Lines
pyfive/inspect.py 78.57% 16 Missing and 11 partials ⚠️
pyfive/h5d.py 70.45% 6 Missing and 7 partials ⚠️
pyfive/p5dump.py 90.00% 1 Missing and 1 partial ⚠️
pyfive/btree.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #138      +/-   ##
==========================================
+ Coverage   74.66%   76.10%   +1.43%     
==========================================
  Files          12       14       +2     
  Lines        2712     2862     +150     
  Branches      407      450      +43     
==========================================
+ Hits         2025     2178     +153     
+ Misses        576      561      -15     
- Partials      111      123      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zequihg50
Copy link
Contributor

zequihg50 commented Oct 27, 2025

This could serve to validate using external tools, although I think I'm more in favor of checking with hardcoded values here, just to avoid a false positive if the external tool is wrong.

def test_get_chunk_info_chunked():
    # start lazy, then go real

    with pyfive.File(DATASET_CHUNKED_HDF5_FILE) as hfile, \
            h5py.File(DATASET_CHUNKED_HDF5_FILE) as h5f, \
            open(DATASET_CHUNKED_HDF5_FILE, "rb") as f:

        ds = hfile.get_lazy_view('dataset1')
        assert ds.id._DatasetID__index_built == False

        si = StoreInfo((0, 0), 0, 4016, 16)
        info = ds.id.get_chunk_info(0)
        assert info == si

        assert ds.id.get_num_chunks() == 88
        assert h5f["dataset1"].id.get_num_chunks() == 88
        assert h5f["dataset1"].id.get_chunk_info(0) == si

        assert ds.id.btree_range == (1072, 8680)
        f.seek(1072)
        assert f.read(4) == b"TREE"  # only v1 btrees
        f.seek(8680)
        assert f.read(4) == b"TREE"  # only v1 btrees

@bnlawrence
Copy link
Collaborator Author

@zequihg50 Thanks. That's an excellent addition and makes me far happier. I'll push that up in a minute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Over agressive optimisation of b-tree reading. A pyfive "ncdump" Exposing b-tree layout information

4 participants