-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Dear reader,
does keraslm-rate take hyphenated words into account?
Using this demo file https://digi.ub.uni-heidelberg.de/diglitData/v/keraslm/test-fouche10,5-s1.pdf
It seems that many of the low rated words have hyphens:
With hyphenation:
# median: 0.962098 0.622701 ; mean: 0.948695 0.625144, correlation: 0.315179
# OCR-D-OCR OCR-D-KERAS
0.693236 0.410939 # region0002_line0021_word0003 daf3
0.927003 0.468318 # region0002_line0029_word0006 Rä-
0.932888 0.480686 # region0002_line0021_word0002 Lyon,
0.904642 0.484226 # region0002_line0032_word0001 Kerker.
0.909297 0.484817 # region0002_line0032_word0004 klaubt
0.931271 0.489822 # region0002_line0000_word0005 pas-
0.928169 0.491138 # region0000_line0004_word0007 sozia-
0.927566 0.492916 # region0002_line0014_word0003 Pythia;
0.958217 0.494058 # region0000_line0002_word0003 Lyon,
0.963757 0.494978 # region0003_line0001_word0005 Lyon,
0.926153 0.495819 # region0003_line0000_word0004 Kon-
0.960306 0.496031 # region0002_line0010_word0007 Lyon
0.911557 0.496326 # region0002_line0001_word0004 Rousseaus
0.967390 0.496934 # region0000_line0011_word0003 1792
0.929831 0.497394 # region0002_line0004_word0003 im
0.960453 0.498529 # region0002_line0017_word0006 Lyon
0.910209 0.499826 # region0002_line0018_word0002 Instinktiv
...
Without (manually removed) hyphenation:
# median: 0.962198 0.623943 ; mean: 0.949162 0.628181, correlation: 0.278264
# OCR-D-OCRNOHYP OCR-D-KERNOHYP
0.693236 0.411037 # region0002_line0021_word0003 daf3
0.932888 0.480686 # region0002_line0021_word0002 Lyon,
0.904642 0.484226 # region0002_line0032_word0001 Kerker.
0.909297 0.484817 # region0002_line0032_word0004 klaubt
0.927566 0.492916 # region0002_line0014_word0003 Pythia;
0.958217 0.494058 # region0000_line0002_word0003 Lyon,
0.963757 0.494945 # region0003_line0001_word0005 Lyon,
0.960306 0.496031 # region0002_line0010_word0007 Lyon
0.911557 0.496306 # region0002_line0001_word0004 Rousseaus
0.967390 0.496923 # region0000_line0011_word0003 1792
0.929831 0.497394 # region0002_line0004_word0003 im
0.960453 0.498542 # region0002_line0017_word0006 Lyon
0.910209 0.499822 # region0002_line0018_word0002 Instinktiv
...
Metadata
Metadata
Assignees
Labels
No labels