Skip to content

Fold fre pp rename-split into fre pp split-netcdf --rename#783

Open
Copilot wants to merge 12 commits intorename-splitfrom
copilot/noaa-gfdl-717-fold-rename-split
Open

Fold fre pp rename-split into fre pp split-netcdf --rename#783
Copilot wants to merge 12 commits intorename-splitfrom
copilot/noaa-gfdl-717-fold-rename-split

Conversation

Copy link
Contributor

Copilot AI commented Mar 18, 2026

Describe your changes

Takes the fre pp rename-split functionality from the rename-split branch and folds it into fre pp split-netcdf via a --rename flag. Without --rename, split-netcdf behaves as before.

fre/pp/split_netcdf_script.py

  • Added rename and diag_manifest parameters to split_file_xarray(): when rename=True, each split file is written directly to its final nested component/freq/duration/ path (no intermediate flat file, no copy, no delete)
  • Added _compute_renamed_path() helper that uses an in-memory time-decoded dataset to determine frequency, duration, and date range before the write, enabling a single file touch per variable
  • Includes try/finally for proper resource cleanup of the decoded dataset
  • Converted split_file_xarray to 4-space indentation (PEP 8)

fre/pp/frepp.py

  • Added --rename (-r) flag and --diag-manifest (-d) option to the split-netcdf click command, passing them through to split_file_xarray()
  • Removed standalone fre pp rename-split command
# Without --rename (unchanged behavior)
fre pp split-netcdf -f 00010101.atmos_daily.tile6.nc -o output/ -v all

# With --rename (new)
fre pp split-netcdf -f 00010101.atmos_daily.tile6.nc -o output/ -v all --rename
# produces: output/atmos_daily/P1D/P6M/atmos_daily.00010101-00010630.temp.tile6.nc

fre/tests/test_fre_pp_cli.py — CLI tests via CliRunner:

  • Verifying --rename and --diag-manifest appear in help output
  • Functional tests for split-netcdf --rename (parametrized for timeseries and static cases)
  • Backward compatibility test (no --rename → flat output)
  • Uses a split_rename_ncgen fixture and tmp_path for output directories

fre/pp/tests/test_split_netcdf.py — Unit tests via standard import (merged into existing test file):

  • Direct import tests calling split_file_xarray with rename=True for timeseries and static data
  • Backward compatibility test calling split_file_xarray without rename
  • Directory structure validation (nested dirs, no flat files at root, freq/duration depth)
  • Uses an ncgen_setup pytest fixture and tmp_path for output directories

Issue ticket number and link (if applicable)

Checklist before requesting a review

  • I ran my code
  • I tried to make my code readable
  • I tried to comment my code
  • I wrote a new test, if applicable
  • I wrote new instructions/documentation, if applicable
  • I ran pytest and inspected it's output
  • I ran pylint and attempted to implement some of it's feedback
  • No print statements; all user-facing info uses logging module
Original prompt

This section details on the original issue you should resolve

<issue_title>fre pp split-netcdf --rename calls fre pp rename-split functionality</issue_title>
<issue_description>solve with a PR to #717, starting from the rename-split branch

  • take the functionality of fre pp rename-split and fold it into fre pp split-netcdf when there's a --rename flag.
  • without --rename, fre pp split-netcdf functions as it currently does
  • corresponding unit tests with click CliRunner in fre/tests
  • corresponding unit tests via standard import in fre/pp/tests</issue_description>

Comments on the Issue (you are @copilot in this section)


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Copilot AI changed the title [WIP] [NOAA-GFDL-717] Add rename functionality to split-netcdf Fold fre pp rename-split into fre pp split-netcdf --rename Mar 18, 2026
Copilot AI requested a review from ilaflott March 18, 2026 14:54
@ilaflott
Copy link
Member

ilaflott commented Mar 18, 2026

Oh, this PR should be targeting rename-split in #717 but i can't seem to correct it

edit: got it

@ilaflott ilaflott changed the base branch from main to rename-split March 18, 2026 15:04
Copy link
Member

@ilaflott ilaflott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot in the right direction, address the feedback with further code edits. respond to the feedback with a simple sentence or two with what you did to address it

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 64.58333% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.40%. Comparing base (0f8c37e) to head (73a872d).

Files with missing lines Patch % Lines
fre/pp/split_netcdf_script.py 63.57% 51 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           rename-split     #783      +/-   ##
================================================
- Coverage         84.04%   83.40%   -0.64%     
================================================
  Files                71       71              
  Lines              4969     5057      +88     
================================================
+ Hits               4176     4218      +42     
- Misses              793      839      +46     
Flag Coverage Δ
unittests 83.40% <64.58%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
fre/pp/frepp.py 86.84% <100.00%> (+0.17%) ⬆️
fre/pp/split_netcdf_script.py 63.79% <63.57%> (-9.40%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f8c37e...73a872d. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot in the right direction, address the feedback with further code edits. respond to the feedback with a simple sentence or two with what you did to address it

Addressed all four review comments: moved rename logic into split_file_xarray, converted setup to a fixture, moved CLI tests to test_fre_pp_cli.py, and replaced manual cleanup with tmp_path. See commit e7b8c47.

Copilot AI requested a review from ilaflott March 18, 2026 15:20
@ilaflott ilaflott marked this pull request as ready for review March 18, 2026 15:27
@ilaflott
Copy link
Member

@copilot 4 spaces for indentation, not 2

Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot 4 spaces for indentation, not 2

Converted split_file_xarray in split_netcdf_script.py from 2-space to 4-space indentation. All tests pass. Commit 1d80fe5.

@cwhitlock-NOAA
Copy link
Contributor

These comments aren't tied to a specific line and tend to be more about design decisions, so I'm putting them here, rather than in the main review section:

  1. In the meeting that preceded this, one of the advantages of this approach that came up was doing the renaming + rewriting as a single file touch. That's not what the copilot rewrites are doing - it's splitting as the default and then copying. I can see that making sense from a code legibility standpoint, but it should be a decision by humans. It's also going to take a more comprehensive rewrite, since rename_file() in rename_script.py assumes that we have a preexisting file from split-netcdf to work with - it's splitting the filename and opening the file to get dates to parse.

  2. We still have two separate files for tests of renaming and splitting: fre/pp/tests/test_split_netcdf_rename.py and fre/pp/tests/test_split_netcdf.py . That decision could make sense if we decided that certain options never get exercised with each other (e.g. --rename and --vars), but if that's the case we need code to make sure those options are mutually exclusive. I am also pretty sure we don't want that - the var list options + --rename is needed for adding variable filtering in the yamls.

  3. Tests for get_freq_and_format_from_two_dates and get_duration_from_two_dates in rename_script.py need both examples that let you verify failed cases (these are not valid dates) and a slightly wider selection of tests (more calendars, documenting behavior on some edge cases). I'm happy to take on that bit. We should also consider renaming that file to something like "split and rename helper functions" if the main renaming functionality is moving to a different file.

Copy link
Contributor

@cwhitlock-NOAA cwhitlock-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates my comment from the main PR, but there are some design decisions we should talk about:

  1. In the meeting that preceded this, one of the advantages of this approach that came up was doing the renaming + rewriting as a single file touch. That's not what the copilot rewrites are doing - it's splitting as the default and then copying. I can see that making sense from a code legibility standpoint, but it should be a decision by humans. It's also going to take a more comprehensive rewrite, since rename_file() in rename_script.py assumes that we have a preexisting file from split-netcdf to work with - it's splitting the filename and opening the file to get dates to parse.

  2. We still have two separate files for tests of renaming and splitting: fre/pp/tests/test_split_netcdf_rename.py and fre/pp/tests/test_split_netcdf.py . That decision could make sense if we decided that certain options never get exercised with each other (e.g. --rename and --vars), but if that's the case we need code to make sure those options are mutually exclusive. I am also pretty sure we don't want that - the var list options + --rename is needed for adding variable filtering in the yamls.

  3. Tests for get_freq_and_format_from_two_dates and get_duration_from_two_dates in rename_script.py need both examples that let you verify failed cases (these are not valid dates) and a slightly wider selection of tests (more calendars, documenting behavior on some edge cases). I'm happy to take on that bit. We should also consider renaming that file to something like "split and rename helper functions" if the main renaming functionality is moving to a different file.

  4. Copilot has eliminated a LOT of tests from fre/pp/tests/test_split_netcdf_rename.py . I'm not as familiar with that part of the code, but it looks like the total cases for testing/determining frequency were reduced a lot. @ceblanton could you check on the reductions there?

Clarified the example of the nested directory structure for output files.
Added a comment to clarify the purpose of the fixture.
@ilaflott
Copy link
Member

This duplicates my comment from the main PR, but there are some design decisions we should talk about:

  1. In the meeting that preceded this, one of the advantages of this approach that came up was doing the renaming + rewriting as a single file touch. That's not what the copilot rewrites are doing - it's splitting as the default and then copying. I can see that making sense from a code legibility standpoint, but it should be a decision by humans.

we're humans and we are here to make those decisions! and good catch, we'd rather it just write, rather than write + copy + delete an old copy.

  1. We still have two separate files for tests of renaming and splitting: fre/pp/tests/test_split_netcdf_rename.py and fre/pp/tests/test_split_netcdf.py . That decision could make sense if we decided that certain options never get exercised with each other (e.g. --rename and --vars),

agreed, should be one file here.

  1. Tests for get_freq_and_format_from_two_dates and get_duration_from_two_dates in rename_script.py need both examples that let you verify failed cases (these are not valid dates) and a slightly wider selection of tests (more calendars, documenting behavior on some edge cases). I'm happy to take on that bit. We should also consider renaming that file to something like "split and rename helper functions" if the main renaming functionality is moving to a different file.

this is out of scope of this PR, those exceptions should get covered in #717, which this PR is targeting. i.e. this PR is a sub-PR of #717

  1. Copilot has eliminated a LOT of tests from fre/pp/tests/test_split_netcdf_rename.py . I'm not as familiar with that part of the code, but it looks like the total cases for testing/determining frequency were reduced a lot. @ceblanton could you check on the reductions there?

no this is not the right take- this file did not exist before, so these tests were not taken away. i did shuffle some tests out of that file and into fre/tests/test_fre_pp_cli.py, but i have not removed any previously-existing tests

@ilaflott
Copy link
Member

@copilot respond to feedback, it's paramount that the --rename option not trigger a write + copy-to-new-name.

Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot respond to feedback, it's paramount that the --rename option not trigger a write + copy-to-new-name.

Refactored split_file_xarray to write directly to the final nested path when --rename is set. Added _compute_renamed_path helper that uses an in-memory decoded dataset to determine freq/duration/dates before the write, so there's no intermediate flat file, no copy, and no delete. Also merged test_split_netcdf_rename.py into test_split_netcdf.py per the one-file feedback. Commit 73a872d.

Copy link
Contributor

@ceblanton ceblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. I think this is what we want, yes!

def split_netcdf(file, outputdir, variables):
@click.option('-r', '--rename', is_flag=True, default=False,
help='After splitting, rename output files into a nested directory structure '
'organized by frequency and duration under $outputdir.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"During splitting", not "after splitting", to make it clear it's a single operation not two.

''' fre pp split-netcdf --help includes --diag-manifest option '''
result = runner.invoke(fre.fre, args=["pp", "split-netcdf", "--help"])
assert result.exit_code == 0
assert "--diag-manifest" in result.output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the new test above it do not test anything other than the existence of the click options!

@cwhitlock-NOAA
Copy link
Contributor

  1. We still have two separate files for tests of renaming and splitting: fre/pp/tests/test_split_netcdf_rename.py and fre/pp/tests/test_split_netcdf.py . That decision could make sense if we decided that certain options never get exercised with each other (e.g. --rename and --vars),

agreed, should be one file here.

  1. Tests for get_freq_and_format_from_two_dates and get_duration_from_two_dates in rename_script.py need both examples that let you verify failed cases (these are not valid dates) and a slightly wider selection of tests (more calendars, documenting behavior on some edge cases). I'm happy to take on that bit. We should also consider renaming that file to something like "split and rename helper functions" if the main renaming functionality is moving to a different file.

this is out of scope of this PR, those exceptions should get covered in #717, which this PR is targeting. i.e. this PR is a sub-PR of #717
Fair enough! I'll add a comment there instead

  1. Copilot has eliminated a LOT of tests from fre/pp/tests/test_split_netcdf_rename.py . I'm not as familiar with that part of the code, but it looks like the total cases for testing/determining frequency were reduced a lot. @ceblanton could you check on the reductions there?

no this is not the right take- this file did not exist before, so these tests were not taken away. i did shuffle some tests out of that file and into fre/tests/test_fre_pp_cli.py, but i have not removed any previously-existing tests

Line 104-117 of https://github.com/NOAA-GFDL/fre-cli/blob/rename-split/fre/pp/tests/test_rename_split_to_pp.py (rename-split) contain a set of parameterized tests that test a bunch of frequency/duration pairings. I am not seeing anything similar in https://github.com/NOAA-GFDL/fre-cli/blob/73a872d2c5632e2747d78f51e60abfba42275a96/fre/tests/test_fre_pp_cli.py .

@cwhitlock-NOAA
Copy link
Contributor

cwhitlock-NOAA commented Mar 18, 2026

Line 104-117 of https://github.com/NOAA-GFDL/fre-cli/blob/rename-split/fre/pp/tests/test_rename_split_to_pp.py (rename-split) contain a set of parameterized tests that test a bunch of frequency/duration pairings. I am not seeing anything similar in https://github.com/NOAA-GFDL/fre-cli/blob/73a872d2c5632e2747d78f51e60abfba42275a96/fre/tests/test_fre_pp_cli.py .

I take that back - the tests are still in lines 104-117 of https://github.com/NOAA-GFDL/fre-cli/blob/73a872d2c5632e2747d78f51e60abfba42275a96/fre/pp/tests/test_rename_split_to_pp.py . I got thrown off because that file wasn't showing as part of the changed files...which it wouldn't, if it was being left alone.

My concern is now that we've currently got 3 different files with tests for split-netcdf. Can we consolidate those into a single file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fre pp split-netcdf --rename calls fre pp rename-split functionality

4 participants