-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
I am trying to train on a 34Ggb (result from df.info) dataset over 8 GPUs w/ 396gb of RAM. I can only get away with training on half the dataset currently without OOM errors killing the process. Each GPU ends up loaded with ~10gb of data. Does that mean the actual data size is 160gb (8 GPUs * 10gb * 2 halves to the data).
Any advice on how to train on so much data using. XGBoost Ray would be helpful.
Metadata
Metadata
Assignees
Labels
No labels