Open
Description
Hello,
I'm currently re-training a Boltz model using the provided datasets. When I attempt to leverage DDP (Distributed Data Parallel) for multi-GPU training, I encounter an error related to the Featurizer. Specifically, the error occurs when using devices=4 with a batch size of 1. The error message is as follows:
"Featurizer failed on 6y9b with error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Skipping."
Interestingly, when I use devices=1 with the same batch size=1, the training proceeds without any issues. I suspect that this issue might be related to DDP or the DataLoader, but I'm not certain. Could you please provide some insights into this matter?
Thank you in advance for your help!
Metadata
Metadata
Assignees
Labels
No labels