-
Notifications
You must be signed in to change notification settings - Fork 83
Hourly reanalysis downloading #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hourly reanalysis downloading #296
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #296 +/- ##
===========================================
- Coverage 72.49% 69.90% -2.59%
===========================================
Files 29 29
Lines 3690 3828 +138
Branches 796 571 -225
===========================================
+ Hits 2675 2676 +1
- Misses 826 966 +140
+ Partials 189 186 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, it's a great addition to the toolkit! The only comment holding me back from an approval was the one regarding the monthly files from ERA5, and then all the base files from the MERRA2. I'm thinking we don't really need them, and should ditch them, but I could be swayed towards leaving it as-is as well.
…fication options; deleting downloaded NetCDF files after saving renalaysis outputs
…hly era5 reanalysis downloading function, including using nearest grid point
@RHammond2, thanks for your ideas for improving this code. I think I addressed all your comments, so this is ready for a re-review. In addition to the code style improvements you pointed out, here are some of the main changes I made:
|
lon_nearest = node_spacing * np.round(lon / node_spacing) | ||
|
||
# See: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=form | ||
# Get data for 9 nearest grid points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can get removed.
openoa/utils/downloader.py
Outdated
"time": [f"{i:02d}:00" for i in range(24)], | ||
"product_type": "reanalysis", | ||
"area": [ | ||
lat_nearest + 0.01, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the 0.1 buffer for? Unless something has changed, I believe a single point can be requested by repeating the latitude and longitude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the buffer doesn't seem to be needed.
openoa/utils/downloader.py
Outdated
logger.error(e) | ||
|
||
# get the saved data | ||
ds_nc = xr.open_mfdataset(f"{save_pathname / f'{save_filename}*.nc'}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but just a note that xarray, pandas, and pretty much any major package will handle Path
object file inputs at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that xarray function will only accept glob-like file paths if they are strings. I get an error when leaving it as a Path
object. I cleaned it up a little by just wrapping str()
around the Path
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of really minor comments, otherwise thanks for cleaning this up @ejsimley! I think this is good to go whenever you're comfortable with it.
Don't hate me... but I just found this was released last week: era5-timeseries, which significantly speeds up era5 downloads (from hours to <30s), and in (zipped) csv format too... although at the moment the signals are more limited and its in a beta stage... so perhaps something to consider later on! Note too that some consultants/analysts use the 4/9 nearest nodes, because sometimes the nearest node doesn't correlate as well as one further away. For example with a costal wind farm where the node offshore/onshore correlates better than the one nearest to the wind farm in a different environment. I never applied this in the OpenOA code though, but was a legacy from some internal code where I did. |
@charlie9578 that's amazing news for ERA5! It seems like the functionality and data may change over time, so it's something we should keep watching for when it becomes stable. The download times alone are a huge selling point, though It looks like there is only 10m u/v wind, and not 100m so we wouldn't be able to extrapolate wind speeds. |
Hi @charlie9578, thanks also for explaining why the 9 nearest nodes were getting requested in the monthly download functions. I saw that only the nearest node was getting used, so simplified the request in the hourly version to only get the nearest node. We could consider downloading the 4/9 nearest nodes as an option in the future though. |
Also, for reference, and if you want to add your voice, there's a forum post regarding the time-series here . Thanks, Charlie |
@RHammond2, thanks for taking another look. I tried to address your remaining comments as best I could. Note that I also changed the ERA5 hourly downloading function so that it requests data in month-long chunks from the CDS API instead of year-long requests. I started getting error messages saying the requests were too large, but found that 1-month (or up to around 4 months) was fine. This doesn't affect performance much, so I'm going to go ahead and merge. |
This pull request adds two new functions to the
utils/downloader
module to download hourly reanalysis data:get_era5_hourly
andget_merra2_hourly
. These functions are modified fromget_era5_monthly
andget_merra2_monthly
contributed by @charlie9578, and similarly download data from the CDS and GES DISC services.Note that it can take a long time to download historical data (~1 day for a 20-year time series). Downloading era5 data seems to be faster than merra2 for me, though.