Replies: 1 comment 2 replies
-
|
yes, you can use either. you just need to be able to represent your data as a recordbatch as a precondition (or any type a write function accepts). |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a use case where I want to save some information, which can consist of numpy nd array of variable shape, numpy 1D arrays, objects like pytorch model state_dict, pytorch optimizer state_dict, scalars like floats, ints, strings, custom types like MetricsInfo etc. These formats can be encoded as various datatypes in arrow which is straight forward for primitive types, tricks like #48099 for variable shape tensors and binary type for other python objects.
I only want to use a single file for all these. The information will be generated periodically like after each epoch in training deep learning models. So at each period, say epoch end, I need to save this information to the file. This is important because if the run is interrupted, I don't want to lose all information till the current epoch. Nor do I want to pressure the memory by buffering all information for a single write.
Can this be done with files of
arroworparquetformat?EDIT: after adding the data, I would like to get random access reads to the saved data
Beta Was this translation helpful? Give feedback.
All reactions