Add DiCoW: Diarization-Conditioned Whisper

### Model description

We (BUT Speech@FIT) have recently developed DiCoW (Diarization-Conditioned Whisper), a target-speaker ASR model that enhances OpenAI’s Whisper by integrating speaker diarization for multi-talker, speaker-attributed ASR.

Unlike previous approaches, DiCoW directly conditions on diarization outputs and achieves state-of-the-art performance on multi-talker benchmarks such as AMI and Libri2Mix. The model recently secured second place in the Challenge and Workshop on Multilingual Conversational Speech Language Model (MLC-SLM) and received a jury award at the CHIME-8 challenge.

DiCoW employs Frame-Level Diarization-Dependent Transformations (FDDT), applying frame-wise projections of different embeddings—Silence, Target speaker, Non-target speaker, and Overlap with target—based on diarization outputs.

Designed for long-form, multi-speaker transcription tasks, DiCoW excels in scenarios such as meetings, interviews, and spontaneous conversations. It also performs well for single-speaker ASR, achieving Word Error Rates (WER) of 2.1 on LibriSpeech test-clean, 4.3 on test-other, 5.3 on TED-LIUM, and 11.2 on VoxPopuli.

The model is based on Whisper and the [v3.2](https://huggingface.co/BUT-FIT/DiCoW_v3_2) version is already integrated with the Hugging Face Transformers AutoClasses.

### Open source status

- [x] The model implementation is available
- [x] The model weights are available

### Provide useful links for the implementation

Source Repositories

* [Training Code: TS-ASR-Whisper](https://github.com/BUTSpeechFIT/TS-ASR-Whisper)
* [Inference Code: DiCoW](https://github.com/BUTSpeechFIT/DiCoW)
 
Related Publications
* *DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition* [Computer Speech & Language, 2025](https://www.sciencedirect.com/science/article/pii/S088523082500066X)

* *Target Speaker ASR with Whisper* [IEEE ICASSP 2025](https://doi.org/10.1109/ICASSP49660.2025.10887683)

* *BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge* [CHiME 2024 Proceedings](https://doi.org/10.21437/CHiME.2024-4)

* *BUT System for the MLC-SLM* [arXiv:2506.13414](https://arxiv.org/abs/2506.13414)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DiCoW: Diarization-Conditioned Whisper #39430

Model description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add DiCoW: Diarization-Conditioned Whisper #39430

Description

Model description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions