DiceScore
yields 1.0 scores when using average="none"
for samples where class is not present
#2850
Labels
DiceScore
yields 1.0 scores when using average="none"
for samples where class is not present
#2850
🐛 Bug
The current implementation of
DiceScore
yields scores of 1.0 for samples that don't contain a particular class when calculating class-wise metrics viaaverage="none"
.This leads to very high dice scores, particularly for rare classes, as samples where the class is not present will push the metric towards 1.0. It also makes dice scores among classes incomparable, unless the val/test dataset is balanced.
To Reproduce
The following code snipped shows an example where for 1000 samples, the first class is only present in the first sample. Even though the prediction on that first sample is wrong, it yields a close to perfect class dice score of 99.9%, because for the 999 samples where the class is absent, a score of 1.0 is used.
Code sample
Expected behavior
The above code snipped should return
[0.0, nan, nan]
(nan for second & third classes because they are not present in any of the provided samples, therefore no meaningful score can be calculated).Environment
1.6.0
torch==2.3.0
Additional context
GeneralizedDiceScore
yields 0 scores when usingper_class=True
for samples where class is not present #2846 forGeneralizedDiceScore
, but there0.0
scores for samples where class is missing are used.MONAI
library addresses this through theignore_empty
keyword argument, which is set toTrue
by default (see monai.metrics.DiceMetric)The text was updated successfully, but these errors were encountered: