-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing pipeline for TDX Hydro Files #8
Comments
@rajadain, we have finalized our processing pipeline for the TDX Hydro stream network ('streamnet) and corresponding basins ('stream reach_basins') files. We are presently running the full set of files for the globe, which should be completed this afternoon. In the meanwhile, here is an example set of three GeoParquet files that will be produced for each of the 62 TDX Hydro Regions (provided in the tdx_regions.parquet):
These supersede the files we shared with you two weeks ago under #4 (comment). These files have been substantially compressed vs NGA's GeoPackage files. @rajadain, could you work with your team to:
We are getting close to delivering this to you:
All of the above will likely benefit from using a parallel set of simplified geometries, which we are also exploring.\ For now, read these files using the |
from @ptomasula's Oct 4 email to @rajadain: We have uploaded parquet files with the modified nested set index (MNSI) information for 61 of the 62 TDXHydro regions to that S3 bucket. The missing files (5020054880) are for a region in Australia and failed during our initial run of the processing pipeline. We still wanted to get you over the bulk of the data since it will likely take some time to download and get integrated into the system. We’ll investigate that last file next week and get that over to you soon. Anthony outlined a fair bit of this under this issue when he provided you with an example set of files, but I think it’s worth repeating here. For each TDXHydro region there are 3 files;
In addition to the TDXHydro data fields, these files also each contain the MNSI fields. We’ll send a follow-up email with additional information and instructions on how to leverage the fields for delineation algorithms, but here is a brief explanation of the fields we have added.
For the basin files, there are also two additional fields to support pre-dissolving basin geometries and improve delineation performance.
Lastly, we have converted the index values in |
Thanks for the info. I was able to ingest the GeoParquet files into PostGIS after some trial and error. I ingested the I ingested the Here's a couple questions I had:
|
@rajadain, that's great news.
The geometries in the We developed the |
This is a new delivery from LimnoTech, from WikiWatershed/global-hydrography#8
@rajadain, please see our new example notebook, In my last commit, 3d7c0c2, I also demonstrated how to use the Also, when using the |
This is a new delivery from LimnoTech, from WikiWatershed/global-hydrography#8
To for rapid delineation of large watersheds. Last step is to wrap into a package function. #8
Summary
Much of the initial groundwork for processing the TDX Hydro files has been laid under issues #2, #3, #4 and with PRs #5 and #6. Its time to stitch that work together into a processing pipeline that modifies the raw TDX Hydro files by dropping and remaining fields, creating global LINKNO/streamID, and adding the modified nested set index information.
Closure Criteria
The text was updated successfully, but these errors were encountered: