Support for chunksize #17

mohr023 · 2020-02-21T00:38:49Z

Considering some of SIA files like PA##.dbc, loading them into a dataframe directly basically burns 20gb RAM.

Should we add support for pandas Dataframe's chunksize parameter, to handle this correctly? If so, can you identify some sort of caveat about this approach @fccoelho ?

mohr023 · 2020-02-21T02:28:09Z

One alternative I see for implementing this is by returning a generator of DataFrames, such as this:

(pd.DataFrame([next(iter(testdbf.records)) for j in range(i,i+chunksize)]) for i in range(0,len(testdbf), chunksize))

For this test, I'm using DBF object from dbfread.

fccoelho · 2020-10-29T15:44:22Z

This is a real problem @mohr023. If we iterate over the DBF record as we read, we would also need to iterate over them when saving the cachefile, and cannot return the full dataframe after downloading.

If you have a good Idea for solving this, feel free to submit a Pull-request.

fccoelho added the enhancement label Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for chunksize #17

Support for chunksize #17

mohr023 commented Feb 21, 2020

mohr023 commented Feb 21, 2020

fccoelho commented Oct 29, 2020

Support for chunksize #17

Support for chunksize #17

Comments

mohr023 commented Feb 21, 2020

mohr023 commented Feb 21, 2020

fccoelho commented Oct 29, 2020