Maybe consider CLBlast for acceleration on older GPUs (also AMD ones)

### Feature request

The whisper.cpp project makes use of it, if I am not mistaken. See here e.g. https://github.com/ggerganov/whisper.cpp/issues/173

### Motivation

I think the inference part of the models needs to be improved on non state of the art hardware.

### Your contribution

NAN