sequentially duplicate token outputs? #268
Unanswered
alex-nugent
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, thank you for making this repository available! I am investigating the ONNX models within a Java environment. I was able to load up and execute the ONNX models and it does appear to be working. I converted the Decoder python into Java and the timecodes all appear to be accurate. The only issue I am having is duplicate tokens for words on the output.
As an example, audio (16kHz mono 16bit signed PCM) with the following words:
"Einstein's postulate allows us to understand causal relationships in Minkowski space, a mathematical model of space and time."
...become this after STT:
eininstestein' postululate allowallows us to understandunderstand causle relrelationationshiips in mkski spspace a mamathemthematicalical modell of space in time
Perhaps the sequentially duplicated tokens are normal and just something that is filtered out? Or could this be related to my front-end audio processing? I am converting to a float array normalized between -1 and 1.
A full output of the "AlignDict" shows accurate time start/stop.
AlignDict{word='eininstestein'', startTs=0.0, endTs=0.55}
AlignDict{word='postululate', startTs=0.55, endTs=1.11}
AlignDict{word='allowallows', startTs=1.11, endTs=1.5}
AlignDict{word='us', startTs=1.5, endTs=1.66}
AlignDict{word='to', startTs=1.66, endTs=1.82}
AlignDict{word='understandunderstand', startTs=1.98, endTs=2.3}
AlignDict{word='causle', startTs=2.38, endTs=2.93}
AlignDict{word='relrelationationshiips', startTs=2.93, endTs=3.72}
AlignDict{word='in', startTs=3.72, endTs=3.88}
AlignDict{word='mkski', startTs=3.88, endTs=4.59}
AlignDict{word='spspace', startTs=4.59, endTs=4.99}
AlignDict{word='a', startTs=5.54, endTs=5.78}
AlignDict{word='mamathemthematicalical', startTs=5.78, endTs=6.42}
AlignDict{word='modell', startTs=6.42, endTs=6.81}
AlignDict{word='of', startTs=6.81, endTs=6.97}
AlignDict{word='space', startTs=6.97, endTs=7.21}
AlignDict{word='in', startTs=7.21, endTs=7.37}
AlignDict{word='time', startTs=7.37, endTs=7.6}
I have verified that the duplicate tokens are not a result of the Decode() operation. The duplicated tokes are contained in the raw token probability outputs.
{label index} : {label}
969 : e
3 : in
3 : in
473 : ste
473 : ste
3 : in
991 : '
998 :
998 :
85 : po
25 : st
114 : ul
114 : ul
147 : ate
998 :
998 :
928 : allow
928 : allow
975 : s
998 :
80 : us
998 :
12 : to
998 :
998 :
0 : _
0 : _
801 : understand
801 : understand
0 : _
998 :
998 :
115 : ca
0 : _
80 : us
0 : _
24 : le
998 :
998 :
713 : rel
713 : rel
102 : ation
102 : ation
143 : sh
973 : i
973 : i
317 : ps
998 :
3 : in
998 :
998 :
982 : m
0 : _
990 : k
0 : _
0 : _
682 : sk
973 : i
998 :
998 :
229 : sp
229 : sp
211 : ace
0 : _
0 : _
0 : _
0 : _
0 : _
0 : _
0 : _
998 :
998 :
971 : a
998 :
167 : ma
167 : ma
169 : them
169 : them
8 : at
334 : ical
334 : ical
998 :
71 : mo
78 : de
978 : l
978 : l
998 :
20 : of
998 :
229 : sp
211 : ace
998 :
3 : in
998 :
998 :
242 : time
0 : _
0 : _
0 : _
0 : _
Beta Was this translation helpful? Give feedback.
All reactions