For example, Alibaba's SenseVoice is very good at recognizing in certain scenarios https://github.com/FunAudioLLM/SenseVoice