Skip to content

Conversation

ejsimley
Copy link
Collaborator

@ejsimley ejsimley commented Jun 4, 2024

This pull request adds two new functions to the utils/downloader module to download hourly reanalysis data: get_era5_hourly and get_merra2_hourly. These functions are modified from get_era5_monthly and get_merra2_monthly contributed by @charlie9578, and similarly download data from the CDS and GES DISC services.

Note that it can take a long time to download historical data (~1 day for a 20-year time series). Downloading era5 data seems to be faster than merra2 for me, though.

@ejsimley ejsimley requested a review from RHammond2 June 4, 2024 16:02
@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 148 lines in your changes missing coverage. Please review.

Project coverage is 69.90%. Comparing base (a53308e) to head (2024b43).
Report is 15 commits behind head on develop.

Files with missing lines Patch % Lines
openoa/utils/downloader.py 0.00% 148 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #296      +/-   ##
===========================================
- Coverage    72.49%   69.90%   -2.59%     
===========================================
  Files           29       29              
  Lines         3690     3828     +138     
  Branches       796      571     -225     
===========================================
+ Hits          2675     2676       +1     
- Misses         826      966     +140     
+ Partials       189      186       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, it's a great addition to the toolkit! The only comment holding me back from an approval was the one regarding the monthly files from ERA5, and then all the base files from the MERRA2. I'm thinking we don't really need them, and should ditch them, but I could be swayed towards leaving it as-is as well.

@ejsimley
Copy link
Collaborator Author

@RHammond2, thanks for your ideas for improving this code. I think I addressed all your comments, so this is ready for a re-review. In addition to the code style improvements you pointed out, here are some of the main changes I made:

  • The API request and documentation for ERA5 has been updated to reflect the recent CDS API changes.
  • The API requests for both MERRA2 and ERA5 now only get data from the nearest grid point to the specified coordinates. Since I found that the ERA5 data that gets returned for an arbitrary coordinate is linearly interpolated from the nearest grid points (which is not what we intended when first writing the functions), I updated the code to explcitly request data for the nearest grid point.
  • The intermediate NetCDF files that get downloaded are now deleted after the csv files get saved.
  • The above changes were also made to the existing monthly reanalysis downloading functions.

@ejsimley ejsimley requested a review from RHammond2 March 17, 2025 23:00
lon_nearest = node_spacing * np.round(lon / node_spacing)

# See: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=form
# Get data for 9 nearest grid points
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can get removed.

"time": [f"{i:02d}:00" for i in range(24)],
"product_type": "reanalysis",
"area": [
lat_nearest + 0.01,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 0.1 buffer for? Unless something has changed, I believe a single point can be requested by repeating the latitude and longitude.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the buffer doesn't seem to be needed.

logger.error(e)

# get the saved data
ds_nc = xr.open_mfdataset(f"{save_pathname / f'{save_filename}*.nc'}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but just a note that xarray, pandas, and pretty much any major package will handle Path object file inputs at this point.

Copy link
Collaborator Author

@ejsimley ejsimley Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that xarray function will only accept glob-like file paths if they are strings. I get an error when leaving it as a Path object. I cleaned it up a little by just wrapping str() around the Path object.

Copy link
Collaborator

@RHammond2 RHammond2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of really minor comments, otherwise thanks for cleaning this up @ejsimley! I think this is good to go whenever you're comfortable with it.

@charlie9578
Copy link
Contributor

Don't hate me... but I just found this was released last week: era5-timeseries, which significantly speeds up era5 downloads (from hours to <30s), and in (zipped) csv format too... although at the moment the signals are more limited and its in a beta stage... so perhaps something to consider later on!

Note too that some consultants/analysts use the 4/9 nearest nodes, because sometimes the nearest node doesn't correlate as well as one further away. For example with a costal wind farm where the node offshore/onshore correlates better than the one nearest to the wind farm in a different environment. I never applied this in the OpenOA code though, but was a legacy from some internal code where I did.

@RHammond2
Copy link
Collaborator

@charlie9578 that's amazing news for ERA5! It seems like the functionality and data may change over time, so it's something we should keep watching for when it becomes stable. The download times alone are a huge selling point, though It looks like there is only 10m u/v wind, and not 100m so we wouldn't be able to extrapolate wind speeds.

@ejsimley
Copy link
Collaborator Author

Hi @charlie9578, thanks also for explaining why the 9 nearest nodes were getting requested in the monthly download functions. I saw that only the nearest node was getting used, so simplified the request in the hourly version to only get the nearest node. We could consider downloading the 4/9 nearest nodes as an option in the future though.

@charlie9578
Copy link
Contributor

@charlie9578 that's amazing news for ERA5! It seems like the functionality and data may change over time, so it's something we should keep watching for when it becomes stable. The download times alone are a huge selling point, though It looks like there is only 10m u/v wind, and not 100m so we wouldn't be able to extrapolate wind speeds.

Also, for reference, and if you want to add your voice, there's a forum post regarding the time-series here . Thanks, Charlie

@ejsimley
Copy link
Collaborator Author

@RHammond2, thanks for taking another look. I tried to address your remaining comments as best I could. Note that I also changed the ERA5 hourly downloading function so that it requests data in month-long chunks from the CDS API instead of year-long requests. I started getting error messages saying the requests were too large, but found that 1-month (or up to around 4 months) was fine. This doesn't affect performance much, so I'm going to go ahead and merge.

@ejsimley ejsimley merged commit b761f64 into NREL:develop Mar 31, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants