Resources on chunking

In our meeting yesterday, we started talking about providing some more resources on chunking. This is such an integral part of effective data processing. But my experience is that, while it's easy to understand the implications of chunking in simple/contrived cases, the real world is always more complex. We wondered whether maybe we could pull together some resources that are closer to real-world examples. As I'm writing this, I'm realising that it will be quite hard (for me at least) to separate the concepts of chunking and dask, but a chapter could look something like:

- **Chunking matters**
  - introduce the idea of chunked data and performance implications. Refer readers elsewhere, e.g. https://www.unidata.ucar.edu/blogs/developer/en/entry/chunking_data_why_it_matters
  - introduce dask and core concepts - chunk sizes, align chunks with storage etc
  - ...
- **Chunking in the real world**
  - some "real-world" (geoscience) examples of where thoughtful chunking decisions had big performance implications. Ideally we could curate a few of these to each demonstrate a key concept. @ScottWales, do you have any examples from users you've helped that could help motivate these examples?
  - how to apply custom functions across chunks with xarray and dask. Geoscience-specific examples of using `apply_ufunc` with `dask="allowed"` (better) `dask="parallelized"` (worse, but easier). Also, perhaps a more advanced case like @ScottWales's [API calculation](https://climate-cms.org/posts/2021-11-24-api.html).
  - ...

Interested to hear what others think

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resources on chunking #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resources on chunking #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions