Skip to content

make transcribe_speech_parakeet more resilient to resource and audio length #26

@geoffroy-noel-ddh

Description

@geoffroy-noel-ddh

Currently the amount of memory needed is proportional to the length of the audio clip and the size of the model.

Shorter clips can be done much faster on GPU but longer can't be done at all on smaller GPUs.

It's already possible to use CUDA_VISIBLE_DEVICES='' to force CPU use. But that's not very practical as it needs a second pass after the GPU pass has failed on some audio.

One idea would be to incorporate that second pass as part of the operator itself. First it tries the GPU, then the CPU for each audio.

An even smarter system would keep track of the maximum size of audio that works on each device so it doesn't have to waste time trying when new audio exceed that.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions