Skip to content

Batch size consistency #11985

Discussion options

You must be logged in to vote

Yes, the memory load can be uneven if the text lengths vary a lot.

Currently, the smallest unit that nlp.pipe uses is a single text and it only has a setting to make batches with the same number of texts, so the presence of one very long text can lead to OOM errors for the batch containing that text. If you want to batch texts differently, you'd currently have to do it outside of nlp.pipe.

The transformer is the only built-in component that splits texts up into spans for processing, and all other components like ner process each text as a whole.

If you want more even memory usage, our current advice is to split your input into similar-sized texts, or just to avoid OOM, implement a max tex…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@KennethEnevoldsen
Comment options

@adrianeboyd
Comment options

Answer selected by KennethEnevoldsen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf / memory Performance: memory use
2 participants