re-training issues

Hello,

I'm currently re-training a Boltz model using the provided datasets. When I attempt to leverage DDP (Distributed Data Parallel) for multi-GPU training, I encounter an error related to the Featurizer. Specifically, the error occurs when using devices=4 with a batch size of 1. The error message is as follows:

"Featurizer failed on 6y9b with error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Skipping."

Interestingly, when I use devices=1 with the same batch size=1, the training proceeds without any issues. I suspect that this issue might be related to DDP or the DataLoader, but I'm not certain. Could you please provide some insights into this matter?

Thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

re-training issues #184

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

re-training issues #184

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions