You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the command
python -m train experiment=wt103/h3 wandb=null trainer.devices=8 +trainer.strategy=ddp
to train h3 model on wt103 and set devices=8. The machine can detect [0,1,2,3,4,5,6,7] completely. However, only 0 is actually working, which caused me to have an Out of Memory problem. How can I solve this problem?