Speaker Diarization of Known Speakers #1667
Replies: 2 comments 5 replies
-
Hey @tobiasschmidt89 I'm just started working on this too (for podcasts). Digging into pyannote I see it uses speechbrain "speechbrain/spkrec-ecapa-voxceleb" embedding model. https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb I can share the approach I have in mind:
I'll see how that goes. Let me know how you get on with it. Feel free to hmu if you wanna collaborate a bit on this. |
Beta Was this translation helpful? Give feedback.
-
Hi @desicochrane, @tobiasschmidt89 . I'm exploring ways to improve speaker diarization in recordings with multiple speakers, but my primary goal is to accurately detect a specific target speaker. I have additional enrollment samples for that speaker (which I suppose I can use to generate an embedding), and I'm wondering if anyone here has achieved high precision for this particular use case. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I really enjoy using this Library for speaker diarization to create labeled transcripts in combination with Whisper: Speaker 1: ..., Speaker 2: ..., Speaker 1: ...
Currently I then search and replace the anonymous speaker labels with the real names.
I have some meetings that always have the same speakers (D&D game sessions). Therefore I am searching for a way to kind of create "voice embeddings" of each speaker by recording them in isolation for a minute or so. Then I want to do a speaker diarization using these embeddings to get labels like: Max: ..., Maria: ..., Tobi: ..., Max: ..., Unknown: ...
I would be interested if someone has some pointers on how I could achieve this with Pyannote. I expect I need to do the following:
Crossed items I know how to do.
I am very comfortable with text embeddings. Audio embedding is a new topic for me.
I would really appreciate any pointers or example scripts.
Thank you
T.
Beta Was this translation helpful? Give feedback.
All reactions