Skip to content

Resources on chunking #48

Open
Open
@dougiesquire

Description

@dougiesquire

In our meeting yesterday, we started talking about providing some more resources on chunking. This is such an integral part of effective data processing. But my experience is that, while it's easy to understand the implications of chunking in simple/contrived cases, the real world is always more complex. We wondered whether maybe we could pull together some resources that are closer to real-world examples. As I'm writing this, I'm realising that it will be quite hard (for me at least) to separate the concepts of chunking and dask, but a chapter could look something like:

  • Chunking matters
  • Chunking in the real world
    • some "real-world" (geoscience) examples of where thoughtful chunking decisions had big performance implications. Ideally we could curate a few of these to each demonstrate a key concept. @ScottWales, do you have any examples from users you've helped that could help motivate these examples?
    • how to apply custom functions across chunks with xarray and dask. Geoscience-specific examples of using apply_ufunc with dask="allowed" (better) dask="parallelized" (worse, but easier). Also, perhaps a more advanced case like @ScottWales's API calculation.
    • ...

Interested to hear what others think

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions