Batch size consistency #11985
-
I have been working on an extension (TextDescriptives w. @HLasse) where we wish to calculate surprise (pseudo perplexity) using masked language models. However, we noticed that the current setup for creating batches does not seem to provide consistent batches. So as far as I understand when you use the
This seems to be problematic as it might explode the actual batch size (the one passed to the model) given very long documents - thus leading to an uneven load on GPU memory. I might be misunderstanding something or missed a crucial step somewhere. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Yes, the memory load can be uneven if the text lengths vary a lot. Currently, the smallest unit that The If you want more even memory usage, our current advice is to split your input into similar-sized texts, or just to avoid OOM, implement a max text length and split very long texts if necessary. It gets a little tricky because the memory usage depends on the number of tokens, and the number of tokens can be wildly different for transformer tokenizers vs. spacy tokenizers (and by language), and for speed we don't want to run the transformer tokenizer in advance to do anything more flexible with the span lengths. So if The overlapping strided spans approach is basically okay for As a side note, there is also the setting |
Beta Was this translation helpful? Give feedback.
Yes, the memory load can be uneven if the text lengths vary a lot.
Currently, the smallest unit that
nlp.pipe
uses is a single text and it only has a setting to make batches with the same number of texts, so the presence of one very long text can lead to OOM errors for the batch containing that text. If you want to batch texts differently, you'd currently have to do it outside ofnlp.pipe
.The
transformer
is the only built-in component that splits texts up into spans for processing, and all other components likener
process each text as a whole.If you want more even memory usage, our current advice is to split your input into similar-sized texts, or just to avoid OOM, implement a max tex…