Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo dataset with UFS-Replay #378

Open
2 tasks
jhamman opened this issue Nov 6, 2024 · 1 comment
Open
2 tasks

Demo dataset with UFS-Replay #378

jhamman opened this issue Nov 6, 2024 · 1 comment
Labels
use case 🌎 Real-world use case virtual references 👻 Involves virtual kerchunk/virtualizarr chunk references

Comments

@jhamman
Copy link
Member

jhamman commented Nov 6, 2024

NOAA's UFS Replay could be an interesting public dataset to demo Icechunk with. Its big, >1PB!

It is available in two formats, both of which could be interesting to explore via virtual datasets:

  • A Zarr v2 dataset on Google Cloud Storage (gs://noaa-ufs-gefsv13replay/ufs-hr1)
  • A collection of NetCDF files on AWS S3 (s3://noaa-ufs-gefsv13replay-pds/)

My thinking is that this dataset could be a good stress test for PB scale Icechunk datasets and virtual datasets at scale.

cc @TomNicholas and @timothyas


Known blockers:

@TomNicholas
Copy link
Contributor

A Zarr v2 dataset

We'll need the zarr reader being added to virtualizarr zarr-developers/VirtualiZarr#271

@TomNicholas TomNicholas added use case 🌎 Real-world use case virtual references 👻 Involves virtual kerchunk/virtualizarr chunk references labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case 🌎 Real-world use case virtual references 👻 Involves virtual kerchunk/virtualizarr chunk references
Projects
None yet
Development

No branches or pull requests

2 participants