Skip to content

[1pt] PR: Ripple updates inc multiproc for Ripple FIM 100 data downloads #1590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

RobHanna-NOAA
Copy link
Contributor

@RobHanna-NOAA RobHanna-NOAA commented Jul 10, 2025

Updated FIM 30 scripts to hand much larger ripple FIM 100 data volume.

The focus of the upgrades are

  • Add multi-proc to increase processing time
  • Add flexibility for re-use (if it pops up)
  • Change the output files and columns for new requirements for HydroVIS services.

Also change the outputs to now have three output files as part of the hecras_processing.ipynb file. It takes in all of the data created by the other three scripts, then creates:

  • A metrics files
  • A hecras boundary file which has the hucs included and the wbd HUC8 geometry. This is used by a HV static service.
  • A hecras huc file which is very simple and has no geometry. It is used by HV to help in the flow processing to figure out which data source to start with being either ripple ble, ripple mip, ras2fim, then hand as a fall back. The flow processing for HV,:
    • Get a feature for flow processing and looks up its huc8 number.
    • Goes to the hecras huc table looking for the huc number. If the huc does not exist, that feature is processed by hand.
    • If the huc8 is included, it looks at a column called "selected_source" which is the values of mip, ble or ras2fim. The "selected_source" column is the one of the three sources with the most number of models.
    • It takes that "selected_source", huc, and feature to know which ripple model collection folder to call with has some features in it and how to find it.
    • If it can't find that feature in that selected source S3 folder, it falls back to hand.
    • If it does find that feature, it uses it for FIM processing.

Note: While this is made for ripple FIM 100, I made it more generic for further ripple runs or even for other similar data sources.

Additions

  • 'data/ripple/ripple_shared_tools.sh`: pulls out functions and values that the two ripple data processing scripts can use.

Changes

  • data/ripple
    • get_s3_folder.sh: Updated from earlier version. Downloads from ripple source, calcs some metrics and re-uploads it to our FiM S3 buckets. One MC (model collection) folder at a time.
    • get_s3_folders_from_list.sh: A wrapper to get_s3_folder to download in bulk. This now has multi-processing capacity to speed it up significantly. It is now only limited by network speeds.
    • hecras_processing.ipynb: Upgraded to make the three output files. Note: Renamed from hecras_boundaries.ipynb

Renaming

  • Was: hecras_boundaries.ipynb, now hecras_processing.ipynb

Testing

  • It has already been run at scale with the outputs validated and the three output files are already in full production use.

Deployment Plan (For developer use)

How does the changes affect the product?
(not applicable to the FIM product and its outputs)

  • Code only?
  • If applicable, has a deployment plan be created with the deployment person/team?
  • Require new or adjusted data inputs? Does it have start, end and duration code (in UTC)?
  • If new or updated data sets, has the FIM code been updated and tested with the new/adjusted data (subset is fine, but must be a subset of the new data)?
  • Require new pre-clip set?
  • Has new or updated python packages?

Issuer Checklist (For developer use)

You may update this checklist before and/or after creating the PR. If you're unsure about any of them, please ask, we're here to help! These items are what we are going to look for before merging your code.

  • Informative and human-readable title, using the format: [_pt] PR: <description>
  • n/a - Links are provided if this PR resolves an issue, or depends on another other PR
  • If submitting a PR to the dev branch (the default branch), you have a descriptive Feature Branch name using the format: dev-<description-of-change> (e.g. dev-revise-levee-masking)
  • Changes are limited to a single goal (no scope creep)
  • The feature branch you're submitting as a PR is up to date (merged) with the latest dev branch
  • n/a - pre-commit hooks were run locally
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • CHANGELOG updated with template version number, e.g. 4.x.x.x
  • Add yourself as an assignee in the PR as well as the FIM Technical Lead

Merge Checklist (For Technical Lead use only)

  • Update CHANGELOG with latest version number and merge date
  • Update the Citation.cff file to reflect the latest version number in the CHANGELOG
  • If applicable, update README with major alterations

@RobHanna-NOAA RobHanna-NOAA self-assigned this Jul 10, 2025
@RobHanna-NOAA RobHanna-NOAA added the enhancement New feature or request label Jul 10, 2025
@RobHanna-NOAA RobHanna-NOAA changed the title WIP [1pt] PR: Ripple updates inc multiproc 1pt] PR: Ripple updates inc multiproc for Ripple FIM 100 data downloads Jul 11, 2025
@RobHanna-NOAA RobHanna-NOAA changed the title 1pt] PR: Ripple updates inc multiproc for Ripple FIM 100 data downloads [1pt] PR: Ripple updates inc multiproc for Ripple FIM 100 data downloads Jul 11, 2025
@RobHanna-NOAA RobHanna-NOAA marked this pull request as ready for review July 11, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant