Is it possible to use gradient accumulation to counter small GPU memory?

Hi! I had a quick question I was wondering if I could pick your brain about:

I'm using SimCLR for very high-dimensionality data (such that I max out at batch size 4). Clearly, it really isn't feasible to run SimCLR since the batch size is so low. I was thinking about trying to use some sort of gradient accumulation technique, but my concern is that it might not quite mesh well with how the loss function works. Let's say I want to use an effective batch size of 64 (with minibatch size 4). Since we are essentially computing the dot product of the projections, instead of computing the dot product between 64 pairs like it would be in normal SimCLR, it would be like computing the averaged dot product of 8 instances of 4 pairs, and then updating the gradient. I'm not confident that this will have the same effect as a large batch size since the loss itself is reliant on comparing the single positive sample to a large number of negative samples. Do you think there is a way that I can modify this framework to simulate large batch sizes with these types of memory constraints? Or if there is a way I can get gradient accumulation to work the way I want?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to use gradient accumulation to counter small GPU memory? #203

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is it possible to use gradient accumulation to counter small GPU memory? #203

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions