Punctuation and speed of reading #150

JRMeyer · 2021-03-07T08:23:58Z

JRMeyer
Mar 7, 2021
Maintainer

>>> nmstoker
[February 27, 2020, 1:26pm]

Has anyone looked at the topics of punctuation and/or reading speed in
TTS?

For punctuation, a couple of months back I had a go with an adjustment
to the output from espeak so that it kept commas which normally get
taken out of input text. The model trained with it was then responsive
to commas but it had degraded speech quality. If there's interest, I can
write up the process and maybe I'll try again (as MelGAN has moved
quality forward dramatically)

I'm also interested in speed of the output. It's no doubt largely
determined by my dataset but it definitely seems to read a touch faster
than expected. I might try adding a postprocessing step so I can adjust
this outside the model. Wondering where a GST approach might help there
instead? (not something I've looked at closely)

Any suggestions on either of these points?

[LibrtiTTS, phonemizer and punctuation

[This is an archived TTS discussion thread from discourse.mozilla.org/t/punctuation-and-speed-of-reading]

JRMeyer · 2021-03-07T08:24:01Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[February 28, 2020, 12:02pm]

I'd say the best way is to do postprocessing. The rest is always open to
future inconveniences.

I'd also agree that enabling punctuations makes the training harder but
good for right prosody. One option could be replacing all punctuations
with single symbol.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:03Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> nmstoker
[February 28, 2020, 1:30pm]

Yes, good idea on the single symbol point. I can see brackets and commas
for subclauses having a similar effect so this sounds promising.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:06Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> georroussos
[March 4, 2020, 8:50am]

The model with Forward Attn+BN does quite well with punctuation, if it
is a phrase break or a sentence end; case in point, I was experimenting
with length of sentences because that is obviously something the TTS has
problems with, so cutting the text to further smaller sentences helps --
the problem then is the intonation, however when given a comma instead
of a fullstop, the TTS does great and I don't even remember where I
sliced the sentence.

PRE_Reading_news.wav.zip
(500.6 KB)

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:08Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> nmstoker
[March 4, 2020, 10:45am]

. Have you
done anything special to the code and I assume you're using phonemizer
still? (Haven't been able to listen to your samples yet)

The reason I ask is that from my understanding the model doesn't
actually see the punctuation because it doesn't get passed on through
the phonemizer stage - so in those cases where it's doing well with a
comma, I believe it's inferring the sentence structure from the words.
Therefore with the default setup you can't control for cases where a
comma makes a difference, such as this well known pair of similar
sentences with distinct meaning:

1. Let's eat, grandma
2. Let's eat grandma

.only-emoji}

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:11Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> georroussos
[March 4, 2020, 10:49am]

I am using the phonemizer yes! Did not do away with it. I remember that
I tried the same sentence with a fullstop instead of a comma and the
intonation was, indeed, different. I wonder how we can control this. I
would guess a very good, professionally recorded dataset that is 100%
accurately transcribed would definitely help.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:13Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[March 4, 2020, 10:59am]

Example sounds good. Is this a pre-trained model? Or training from
scratch?

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:24:16Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> nmstoker
[March 4, 2020, 12:38pm]

The fullstop is ending the sentence, so in that case the model is
working on a shorter sentence and therefore you get that different
intonation but the one with the comma would, I believe, be acting the
same as if there was no comma there at all, because they're striped
before the model gets to see it. Therefore before a precisely
transcribed dataset can add value, the code would need tweaking to make
it pass on the commas in some manner.

The experiment I mentioned above to work with punctuation made use of a
'marker' symbol that I wrap around the desired punctuation character(s),
the marker gets passed through espeak and the punctuation can then be
added back to the output phoneme text. I'll look at writing it up in
more detail and sharing the code.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Punctuation and speed of reading #150

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Punctuation and speed of reading #150

JRMeyer Mar 7, 2021 Maintainer

Replies: 7 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author