Skip to content

Add open_files function to read GPM files from list of file paths#81

Merged
ghiggi merged 8 commits intomainfrom
add-open_mfdataset
Mar 6, 2026
Merged

Add open_files function to read GPM files from list of file paths#81
ghiggi merged 8 commits intomainfrom
add-open_mfdataset

Conversation

@ghiggi
Copy link
Owner

@ghiggi ghiggi commented Aug 23, 2025

Prework

What kind of change does this PR introduce? (check at least one)

  • Bugfix
  • Feature
  • Documentation
  • Tutorial
  • Code style update
  • Refactor
  • Build-related changes
  • Other, please describe:

Does this PR introduce a breaking change? (check one)

  • Yes
  • No

If yes, please describe the impact and communicate accordingly:

The PR fulfills these requirements:

  • It's submitted to the branch named as follow:
    • Fix a bug: bugfix-<some_key>-<word>
    • Improve the doc: doc-<some_key>-<word>
    • Improve a tutorial tutorial-<some_key>-<word>
    • Add a new feature: feature-<some_key>-<word>
    • Refactor some code: refactor-<some_key>-<word>
    • Optimize some code: optimize-<some_key>-<word>
  • When resolving a specific issue, it's referenced in the PR's title (e.g. fix #xxx[,#xxx], where "xxx" is the issue number)
  • Don't forget to link PR to issue if you are solving one.
  • All tests are passing.
  • New/updated tests are included

Summary

This PR adds the gpm.open_files function which allows to read a list of GPM files given the specified filepaths.
This PR address #80.

@ghiggi
Copy link
Owner Author

ghiggi commented Aug 23, 2025

HI @kmuehlbauer ! I add this in mind the entire week so I spent 2 hours this morning to implement it.

Can you try if it works well for you use case and report possible improvements?
Especially maybe try it out with parallel=True argument to see if you experience some troubles.

To avoid netCDF locking, I typically run it by initializing a dask client as follow:

import os
os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(
        n_workers=20,
        threads_per_worker=1, # important to set to 1 to avoid netcdf locking ! 
        processes=True,
  )
 client = Client(cluster)

FYI: The PR tests fails for a minor problem related to an update of the polars library, but this affect a specific functionality of the software which should not concern you. I will fix it as soon as I have time.

@codecov
Copy link

codecov bot commented Aug 25, 2025

Codecov Report

❌ Patch coverage is 90.75426% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.83%. Comparing base (c5d3d25) to head (6a17590).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
gpm/dataset/granule.py 48.38% 16 Missing ⚠️
gpm/dataset/dataset.py 81.08% 7 Missing ⚠️
gpm/utils/directories.py 86.04% 6 Missing ⚠️
gpm/dataset/coords.py 66.66% 4 Missing ⚠️
gpm/io/download.py 85.71% 3 Missing ⚠️
gpm/visualization/cross_section.py 0.00% 1 Missing ⚠️
gpm/visualization/eda.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
- Coverage   91.18%   90.83%   -0.36%     
==========================================
  Files         135      135              
  Lines       17214    17537     +323     
==========================================
+ Hits        15696    15929     +233     
- Misses       1518     1608      +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coveralls
Copy link

coveralls commented Aug 25, 2025

Coverage Status

coverage: 90.831% (-0.4%) from 91.182%
when pulling 6a17590 on add-open_mfdataset
into c5d3d25 on main.

Copilot AI review requested due to automatic review settings March 6, 2026 09:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new gpm.open_files entry point intended to open arbitrary GPM files from explicit file paths (rather than relying on filename-based parsing), addressing Issue #80 by inferring the product from file metadata when possible.

Changes:

  • Add open_files API to open one/many filepaths and infer product attributes for decoding.
  • Make decoding/coordinate logic more tolerant of unknown product and add gpm_api_product attributes during finalization.
  • Misc maintenance updates across tests, CI, tooling, docs, and some visualization/utilities.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tutorials/tutorial_03_SR_GR_Matching.ipynb Minor text/metadata updates in tutorial notebook.
tutorials/tutorial_03_SR_GR_Calibration.ipynb Minor text/metadata updates in tutorial notebook.
pyproject.toml Packaging metadata + Ruff ignore list adjustments.
gpm/visualization/eda.py Add Ruff suppression on plt.subplots() assignment.
gpm/visualization/cross_section.py Add Ruff suppression on variable reassignment.
gpm/utils/pyresample.py Remove unused local variables in remap routine.
gpm/utils/collocation.py Adjust xr.concat options for compatibility.
gpm/tests/test_io/test_download.py Tighten warning assertions to per-call scope.
gpm/tests/test_dataset/test_granule_files.py Add basic test coverage for open_files.
gpm/tests/test_bucket/test_routines.py Add Ruff suppression comment on regex match.
gpm/retrievals/retrieval_2a_radar.py Add Ruff suppression on assignment.
gpm/retrievals/retrieval_1b_c_pmw.py Add MWCC-H hail probability retrieval.
gpm/io/products.py Add cached loader for products_attributes.yaml.
gpm/io/download.py Add Ruff suppression on tuple unpacking.
gpm/io/checks.py Make check_product accept optional product_type.
gpm/gv/routines.py Add Ruff suppression on assignments.
gpm/etc/products_attributes.yaml New metadata mapping for product inference.
gpm/dataset/granule.py Allow scan_modes=None and attempt scan-mode autodetection.
gpm/dataset/decoding/dataarray_attrs.py Remove per-variable product tagging from attr standardization.
gpm/dataset/decoding/coordinates.py Skip product-specific coordinate logic when product is None.
gpm/dataset/dataset.py Add open_files + product inference and scan-mode handling changes.
gpm/dataset/coords.py Support 1D geolocation arrays (along-track only).
gpm/dataset/conventions.py Add add_gpm_api_product and adjust finalization ordering/guards.
gpm/bucket/dataframe.py Adjust Polars casting behavior in pl_cut.
gpm/init.py Export open_files and set a global xarray option.
docs/source/07_maintainers_guidelines.rst Remove CodeBeat mention.
README.md Badge table and stated supported Python versions updated.
CONTRIBUTING.rst Remove CodeBeat mention.
.pre-commit-config.yaml Update hook versions and tweak enabled hooks.
.github/workflows/tests_windows.yaml Update Windows CI matrix/schedule and actions versions.
.github/workflows/tests.yaml Update CI matrix/schedule and actions versions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@ghiggi ghiggi merged commit fbc988f into main Mar 6, 2026
26 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants