Skip to content

NeMo Feature Request: Integrated Speaker Verification and Diarization for Target Speaker Identification #14265

@Azhou0

Description

@Azhou0

It is hard and not sure to use different model to create a unified model with this feature for a beginner. This is important for a lot of areas, such as phone call, speech recordings, as the large audio model becomes welcome and powerful.

We need:
A unified TargetAwareDiarizer class that:

Accepts pre-registered speaker embeddings during initialization
Performs joint diarization + verification in a single pass
Outputs segments with two new attributes:
is_target: Boolean flag for verified speakers
speaker_id: Custom ID for registered speakers (e.g., "VIP_Customer")

I think it will not be that hard for the develop team, and I will be excited if the team can provide a reliable solution.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions