Skip to content

Conversation

hkershaw-brown
Copy link
Member

Description:

Removes the initialization of close_ind array to -99 in get_close for 3D location_mods.
See #910 for the cost of this initialization.
The output array of close_ind is only valid from 1:num_close

Fixes issue

fixes #910

Types of changes

  • Bug (Performance) fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Please describe any tests you ran to verify your changes.
MPAS case from profiling in #910, and get_close test harness also given in #910

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

@mgharamti
Copy link
Contributor

Performance check (ocean models)

Experiment Application Type State Vector Size Assimilated Obs MPI Processes Time (sec)
ROMS (original code) Regional 645,759,164 571 8000 54
ROMS (no initialization of close_ind) Regional 645,759,164 571 8000 57
MOM6 (original code) Global 77,760,000 258,669 1280 87
MOM6 (no initialization of close_ind) Global 77,760,000 258,669 1280 85

Interpretation: Omitting close_ind initialization shows negligible runtime impact in both regional and global ocean configurations. ROMS (regional): +3 s (~5%). MOM6 (global): −2 s (~2%).

@hkershaw-brown
Copy link
Member Author

Thanks! I wouldn’t necessarily expect to see run time improvements in these cases, with most of the time being in IO (obs sequence or writing the large state) -> the things that don't scale to the large core count.
Did you happen to run with output_timestamps = .true? It would be interesting to see if these timings are mostly IO.

The gotcha for this pull request is if someone is relying on testing for the -99 value in their code.

@mgharamti
Copy link
Contributor

output_timestamps

Helen, unfortunately I didn't turn on output_timestamps. I could rerun filter if you think it's important.

@mgharamti
Copy link
Contributor

Thanks! I wouldn’t necessarily expect to see run time improvements in these cases, with most of the time being in IO (obs sequence or writing the large state) -> the things that don't scale to the large core count. Did you happen to run with output_timestamps = .true? It would be interesting to see if these timings are mostly IO.

The gotcha for this pull request is if someone is relying on testing for the -99 value in their code.

I did run with timestamps turned on. Here is a summary:

  • MOM6: The localization part accounted for ~35% of the total run time with the original code. For the code without initialization of close_ind, the localization accounted for ~31% of the run time. So, there is a small gain.
  • ROMS-Rutgers: Localization in both runs accounted for less than 2% of the run time. IO is the major player in this case.

For future purposes, I placed the 2 ocean DA test cases, with the necessary files, here:

  • MOM6: /glade/campaign/cisl/dares/large_dart_tests/ocean/mom6_global
  • ROMS-Rutgers: /glade/campaign/cisl/dares/large_dart_tests/ocean/roms_rutgers_regional

@hkershaw-brown
Copy link
Member Author

awesome thanks Moha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_close initialization cost

2 participants