make transcribe_speech_parakeet more resilient to resource and audio length

Currently the amount of memory needed is proportional to the length of the audio clip and the size of the model.

Shorter clips can be done much faster on GPU but longer can't be done at all on smaller GPUs.

It's already possible to use CUDA_VISIBLE_DEVICES='' to force CPU use. But that's not very practical as it needs a second pass after the GPU pass has failed on some audio.

One idea would be to incorporate that second pass as part of the operator itself. First it tries the GPU, then the CPU for each audio.

An even smarter system would keep track of the maximum size of audio that works on each device so it doesn't have to waste time trying when new audio exceed that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make transcribe_speech_parakeet more resilient to resource and audio length #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

make transcribe_speech_parakeet more resilient to resource and audio length #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions