Help how to generate a custom scorer #1745

JRMeyer · 2021-03-08T08:47:35Z

JRMeyer
Mar 8, 2021
Maintainer

>>> solyarisoftware
[February 5, 2021, 4:40pm]

Hi all,

I'm a beginner with DeepSpeech. I installed last version as specified
>

And I'm now able to transcript using the CLI command and the native
client (BTW, I'm working on a micro opensource project to show how to
use DS server from nodejs:
https://github.com/solyarisoftware/DeepSpeechJs

Question 1: slash
Considering that I would like to use DS as short sentences ASR for a
closed-domain chatbot, where there are specific kind of user utterances
as:

Giuditta Del Buono)

If I well understood I can improve the transcript accuracy of the
pre-trained language model also 'just' building a custom scorer file
(customApp.scorer) to be used at run-time (avoiding to re-train the
pretrained model with custom audio files):

deepspeech slash
--model deepspeech-0.9.3-models.pbmm slash
--scorer customApp.scorer slash
--audio sample.wav

That's true? slash
BTW, There is any data/report that show quantitatively how accuracy rise
using a custom scorer for specific closed-domain inputs?

Question 2: slash
I read documentation about how to create my own scorer file: slash
#external-scorer-scripts>

But I'm confused. There is any step-by-step tutorial that show how can I
proceed?

A step-by-step example would help a lot! Does it exists?**

Where data/lm/generate_lm.py , and generate_scorer_package are
located?

What's the format of the original text file containing custom sentences?

If, by example, I want to let the ASR better understand 4 digit numeric
codes:

one zero zero zero
one zero zero one
one zero zero two
one zero zero three
...
...
nine nine nine nine

the text is a collection of all possible sentences possible, so in this
case all numbers in letters between 0000 and 9999 ?

Question 3: slash
A last point is not clear to me. For a best result in general case I
would extend the pretrained model scorer with a custom scorer. In
this case, do I need to add custom sentences at the end of the original
pretrained model scorer? Or building the custom scorer is the way t go?

BTW, my configuration:

(deepspeech-venv) uname -a
linux itd-giorgio-laptop 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

(deepspeech-venv) $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal

(deepspeech-venv) $ python --version
Python 3.8.5

(deepspeech-venv) $ deepspeech --version
DeepSpeech 0.9.3

(deepspeech-venv) $ sudo lshw -C display
*-display
description: VGA compatible controller
product: WhiskeyLake-U GT2 [UHD Graphics 620]
vendor: Intel Corporation
physical id: 2
bus info: pci0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:129 memory:a1000000-a1ffffff memory:b0000000-bfffffff ioport:6000(size=64) memory:c0000-dffff

Thanks! slash
giorgio

[This is an archived TTS discussion thread from discourse.mozilla.org/t/help-how-to-generate-a-custom-scorer]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help how to generate a custom scorer #1745

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Help how to generate a custom scorer #1745

Uh oh!

JRMeyer Mar 8, 2021 Maintainer

Replies: 0 comments

JRMeyer
Mar 8, 2021
Maintainer