You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm doing a pretty simple operations shown below but on a large dataset
Convert the csv to hdf5 (df is 1.1 million rows by 6 columns) (df)
Left join df with another smaller dataset (df1) but with allow duplicates - (df2)
Left join df2 with another smaller dataset (df3) but with allow duplicates - (df4 becomes pretty big close to 7billion rows and 12 columns)
Group by on df4 (simple sum on one column on unique two columns) - Runs for over 10 hours but fails with memory error. (I've a reasonably good machine with 24 cores and 256 GB memory)
I reckon the dataset is pretty big - is there anyway I can accomplish this task? Thanks in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm doing a pretty simple operations shown below but on a large dataset
I reckon the dataset is pretty big - is there anyway I can accomplish this task? Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions