Skip to content

Commit 132b229

Browse files
committed
update calibration/finetuning/index and add lots of data/uem files
1 parent ed77c06 commit 132b229

File tree

598 files changed

+21351
-34
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

598 files changed

+21351
-34
lines changed

.github/workflows/publish.yml

-25
This file was deleted.

README.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# On the calibration of powerset speaker diarization models
2+
3+
[Alexis Plaquet](https://frenchkrab.github.io/) and [Hervé Bredin](https://herve.niderb.fr)
4+
Proc. InterSpeech 2024.
5+
6+
> End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, a powerset multiclass formulation has beaten state-of-the-art on multiple datasets. In this paper, we propose to study the calibration of a powerset speaker diarization model, and explore some of its uses. We study the calibration in-domain, as well as out-of-domain, and explore the data in low-confidence regions. The reliability of model confidence is then tested in practice: we use the confidence of the pretrained model to selectively create training and validation subsets out of unannotated data, and compare this to random selection. We find that top-label confidence can be used to reliably predict high-error regions. Moreover, training on low-confidence regions provide a better calibrated model, and validating on low-confidence regions can be more annotation-efficient than random regions.
7+
8+
[Read the paper (TODO)](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html)
9+
10+
[Browse the companion website](https://frenchkrab.github.io/IS2024-powerset-calibration/)
11+
12+
## Citations
13+
14+
To be added.

calibration.qmd

+45-6
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,13 @@ about:
55
links:
66
- icon: github
77
text: Github
8-
href: https://github.com/FrenchKrab
8+
href: https://github.com/FrenchKrab/IS2024-powerset-calibration
99
- icon: book
1010
text: Google Scholar
1111
href: https://scholar.google.com/citations?user=7gJ465gAAAAJ
1212

1313
---
1414

15-
1615
# Raw results table
1716

1817
We could not include the raw result table in the paper. We show it here, and include some additional metrics (Expected Calibration Error using different binning schemes and bin counts). It is pretty clear that the bins used to compute the ECE does not have a huge impact on the metric.
@@ -56,18 +55,58 @@ The paper contains two scatter plots for DER / ECE. Here we grouped all datasets
5655

5756
# Reliability diagrams
5857

58+
Here are reliability diagrams for all 11 DIHARD 3 domains. The paper only shows uniform binning, but we also propose diagrams for adaptive binning.
59+
We put the figures under foldable sections since they take a lot of vertical space.
60+
61+
5962
## Uniform binning with 10 bins
6063

6164
<!-- 09c_view_calibration_eval.ipynb with BINNING_METHOD='uniform' -->
62-
::: {.callout-note appearance="detail" collapse=true}
63-
# Using uniform binning with 10 bins
65+
::: {.callout-note appearance="detail" collapse=true title="Using uniform binning with 10 bins"}
6466
![](site_media/calibration/reliability_uniform10bins.png)
6567
:::
6668

6769
<!-- 09c_view_calibration_eval.ipynb with BINNING_METHOD='adaptive' -->
6870
## Adapative binning with 10 bins
6971

70-
::: {.callout-note appearance="detail" collapse=true}
71-
# Using adaptive binning with 10 bins
72+
Note that the X axis is not linear at all. Since most predictions are confident, the higher bins contain very similar confidence values.
73+
74+
::: {.callout-note appearance="detail" collapse=true title="Using adaptive binning with 10 bins"}
7275
![](site_media/calibration/reliability_adaptive10bins.png)
7376
:::
77+
78+
# Analysis of low-confidence regions
79+
80+
We sample low-confidence data (left column) and random regions of data (right column), and compare the composition of the data as well as the model performance. As usual we provide the figures of all DIHARD domains instead of a select few.
81+
82+
## Data composition
83+
84+
<!-- 21_selected_al_analysis.ipynb -->
85+
::: {.callout-note appearance="detail" collapse=true title="Data composition of low-confidence regions"}
86+
![](site_media/calibration/data_composition.png)
87+
:::
88+
89+
## Model performance (DER)
90+
91+
<!-- 21_selected_al_analysis_der.ipynb -->
92+
::: {.callout-note appearance="detail" collapse=true title="DER on low-confidence regions"}
93+
![](site_media/calibration/der_analysis.png)
94+
:::
95+
96+
97+
# Reproducibility
98+
99+
Pretrained model checkpoint downloads:
100+
101+
- [Github](https://github.com/FrenchKrab/IS2024-powerset-calibration/tree/master/data/calibration/[email protected])
102+
- [HuggingFace (mirror)](https://huggingface.co/aplaquet/IS2024-powerset-calibration/blob/main/pretrained%40epoch109.ckpt)
103+
104+
105+
Composition of the training dataset:
106+
107+
- [pyannote.database protocol specifications](https://github.com/FrenchKrab/IS2024-powerset-calibration/tree/master/data/calibration/database.yml)
108+
109+
110+
Parquet inference files, containing model probabilities and targets for all of the datasets:
111+
112+
- [.parquet inference files](https://huggingface.co/aplaquet/IS2024-powerset-calibration/tree/main/model_inference)

data/calibration/database.yml

+54
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
Requirements:
2+
- database_eie.yml
3+
- database_dihard_1file.yml
4+
- database_dihard_1file2min.yml
5+
- database_dihard_1file-1_2min.yml
6+
- database_dihard_1file-2_2min.yml
7+
- database_dihard_1file-3_2min.yml
8+
9+
Protocols:
10+
X:
11+
SpeakerDiarization:
12+
Pretraining_2023-12_no-DIHARD:
13+
train:
14+
AISHELL.SpeakerDiarization.Adaptation: [train, ]
15+
AliMeeting.SpeakerDiarization.Adaptation: [train, ]
16+
AMI.SpeakerDiarization.Adaptation: [train, ]
17+
AMI-SDM.SpeakerDiarization.Adaptation: [train, ]
18+
AVA-AVD.SpeakerDiarization.Adaptation: [train, ]
19+
CALLHOME.SpeakerDiarization.Adaptation: [train, ]
20+
DISPLACE.SpeakerDiarization.Adaptation: [train, ]
21+
# DIHARD.SpeakerDiarization.Adaptation: [train, ]
22+
Ego4D.SpeakerDiarization.Adaptation: [train, ]
23+
MSDWILD.SpeakerDiarization.Adaptation: [train, ]
24+
RAMC.SpeakerDiarization.Adaptation: [train, ]
25+
REPERE.SpeakerDiarization.Adaptation: [train, ]
26+
VoxConverse.SpeakerDiarization.Adaptation: [train, ]
27+
development:
28+
AISHELL.SpeakerDiarization.Adaptation: [development, ]
29+
AliMeeting.SpeakerDiarization.Adaptation: [development, ]
30+
AMI.SpeakerDiarization.Adaptation: [development, ]
31+
AMI-SDM.SpeakerDiarization.Adaptation: [development, ]
32+
AVA-AVD.SpeakerDiarization.Adaptation: [development, ]
33+
CALLHOME.SpeakerDiarization.Adaptation: [development, ]
34+
DISPLACE.SpeakerDiarization.Adaptation: [development, ]
35+
# DIHARD.SpeakerDiarization.Adaptation: [development, ]
36+
Ego4D.SpeakerDiarization.Adaptation: [development, ]
37+
MSDWILD.SpeakerDiarization.Adaptation: [development, ]
38+
RAMC.SpeakerDiarization.Adaptation: [development, ]
39+
REPERE.SpeakerDiarization.Adaptation: [development, ]
40+
VoxConverse.SpeakerDiarization.Adaptation: [development, ]
41+
test:
42+
AISHELL.SpeakerDiarization.Benchmark: [test, ]
43+
AliMeeting.SpeakerDiarization.Benchmark: [test, ]
44+
AMI.SpeakerDiarization.Benchmark: [test, ]
45+
AMI-SDM.SpeakerDiarization.Benchmark: [test, ]
46+
AVA-AVD.SpeakerDiarization.Benchmark: [test, ]
47+
CALLHOME.SpeakerDiarization.Benchmark: [test, ]
48+
# DISPLACE.SpeakerDiarization.Benchmark: [test, ]
49+
# DIHARD.SpeakerDiarization.Benchmark: [test, ]
50+
# Ego4D.SpeakerDiarization.Benchmark: [test, ]
51+
MSDWILD.SpeakerDiarization.Benchmark: [test, ]
52+
RAMC.SpeakerDiarization.Benchmark: [test, ]
53+
REPERE.SpeakerDiarization.Benchmark: [test, ]
54+
VoxConverse.SpeakerDiarization.Benchmark: [test, ]

data/calibration/[email protected]

16.9 MB
Binary file not shown.

0 commit comments

Comments
 (0)