-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I'm trying to see how far I can take my ~50 GB hdf5 datasets through my processing pipeline before explicitly creating an ndarray. My pipeline uses a framework (Neuropype) that puts the ndarray in a container along with some metadata and makes extensive use of ndarray functions returning views. I think I could get a lot further in this framework with my h5 dataset if a wrapper class like DatasetViewh5py reimplemented some of those ndarray functions that return views.
Are there any downsides to renaming lazy_transpose to transpose?
Do you foresee any problems with a lazy implementation of reshape?
I'm also considering a custom implementation of squeeze.
numpy users expect flatten() to return a copy so probably not that one.
What about min, max, argmin, argmax, any and all when an axis is provided? Even though all of the data will have to be loaded into memory eventually, it can be done sequentially row-by-row (or column-by-column) so maybe this will help avoid out-of-memory errors. I am fairly new to processing data cached-on-disk so I'm hoping others with more experience can tell me if this is a bad idea from the outset.