Skip to content

Can it be made a more transparent drop-in for ndarray? #21

@cboulay

Description

@cboulay

I'm trying to see how far I can take my ~50 GB hdf5 datasets through my processing pipeline before explicitly creating an ndarray. My pipeline uses a framework (Neuropype) that puts the ndarray in a container along with some metadata and makes extensive use of ndarray functions returning views. I think I could get a lot further in this framework with my h5 dataset if a wrapper class like DatasetViewh5py reimplemented some of those ndarray functions that return views.

Are there any downsides to renaming lazy_transpose to transpose?

Do you foresee any problems with a lazy implementation of reshape?

I'm also considering a custom implementation of squeeze.

numpy users expect flatten() to return a copy so probably not that one.

What about min, max, argmin, argmax, any and all when an axis is provided? Even though all of the data will have to be loaded into memory eventually, it can be done sequentially row-by-row (or column-by-column) so maybe this will help avoid out-of-memory errors. I am fairly new to processing data cached-on-disk so I'm hoping others with more experience can tell me if this is a bad idea from the outset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions