Sequentially load / unload train datasets to GPU #20676
Unanswered
meilame-tayebjee
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have 100 subsamples of a huge dataset that are identified with an index idx.
Let's say I want to use 80 subsamples as training set (idx 1 to 80), 10 as val and 10 as test. Each subsample is approximately 6 GB in GPU memory.
Note that I have two H100 GPUs, each of them having 95Gb memory. My model is a GPT-like model, having 31 millions parameters.
I want to use Lightning for training over several epochs, sequentially loading / unloading the datasets on to the GPUs. But without having them all in memory at once - so I do not even want to initialize the datasets beforehand (I also need to initialize them sequentially).
Basically, during one epoch, I want to load first train dataset on GPU / train / unload and load next one --> until the last training dataset. And restart for another epoch.
I started using the
DataModule
class, with something like the following. However, when callingself.trainer.datamodule.next_train_subsample()
the dataset in indeed updated as I want, but I am not sure if thedata_loader
takes into account that update.Happy to have any insights on how to do it in the right way ! Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions