Recommendations about chunk sizes

We currently say nothing at all about chunk sizes, but I think we will need to provide some rules/guidance in order to make processing arrays efficient. For example, it really does help a lot of call-level arrays all have the same chunking (in the variants and samples dimension) so that code can read in (say) genotypes and DP values chunk-by-chunk in the same loop.

Currently vcf2zarr enforces a uniform chunk size across dimensions, so that we have one ``variants_chunk_size``. While this is a useful simplification, it does have some drawbacks, particularly when we want to read in all of a low-dimensional array at once (e.g., ``variant_position). See #21 for discussion and some benchmarks on this point.

This would need some feedback from a variety of implementations and use-cases, I think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommendations about chunk sizes #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommendations about chunk sizes #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions