GitHub - synesthesiam/voice2json: Command-line tools for speech and intent recognition on Linux

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux. It is free, open source (MIT), and supports 18 human languages.

From the command-line:

$ voice2json -p en transcribe-wav \
      < turn-on-the-light.wav | \
      voice2json -p en recognize-intent | \
      jq .

produces a JSON event like:

{
    "text": "turn on the light",
    "intent": {
        "name": "LightState"
    },
    "slots": {
        "state": "on"
    }
}

when trained with this template:

[LightState]
states = (on | off)
turn (<states>){state} [the] light

voice2json is optimized for:

Sets of voice commands that are described well by a grammar
Commands with uncommon words or pronunciations
Commands or intents that can vary at runtime

It can be used to:

Add voice commands to existing applications or Unix-style workflows
Provide basic voice assistant functionality completely offline on modest hardware
Bootstrap more sophisticated speech/intent recognition systems

Supported speech to text systems include:

CMU's pocketsphinx
Dan Povey's Kaldi
Mozilla's DeepSpeech 0.9
Kyoto University's Julius

Supported Languages

Catalan (ca)
- ca-es_pocketsphinx-cmu
Czech (cs)
- cs-cz_kaldi-rhasspy
German (de)
- de_deepspeech-aashishag
- de_deepspeech-jaco
- de_kaldi-zamia (default)
- de_pocketsphinx-cmu
Greek (el)
- el-gr_pocketsphinx-cmu
English (en)
Spanish (es)
- es_deepspeech-jaco
- es_kaldi-rhasspy (default)
- es-mexican_pocketsphinx-cmu
- es_pocketsphinx-cmu
French (fr)
- fr_deepspeech-jaco
- fr_kaldi-guyot (default)
- fr_kaldi-rhasspy
- fr_pocketsphinx-cmu
Hindi (hi)
- hi_pocketsphinx-cmu
Italian (it)
- it_deepspeech-jaco
- it_deepspeech-mozillaitalia (default)
- it_kaldi-rhasspy
- it_pocketsphinx-cmu
Korean (ko)
- ko-kr_kaldi-montreal
Kazakh (kz)
- kz_pocketsphinx-cmu
Dutch (nl)
- nl_kaldi-cgn (default)
- nl_kaldi-rhasspy
- nl_pocketsphinx-cmu
Polish (pl)
- pl_deepspeech-jaco (default)
- pl_julius-github
Portuguese (pt)
- pt-br_pocketsphinx-cmu
Russian (ru)
- ru_kaldi-rhasspy (default)
- ru_pocketsphinx-cmu
Swedish (sv)
- sv_kaldi-montreal
- sv_kaldi-rhasspy (default)
Vietnamese (vi)
- vi_kaldi-montreal
Mandarin (zh)
- zh-cn_pocketsphinx-cmu

Unique Features

voice2json is more than just a wrapper around open source speech to text systems!

Training produces both a speech and intent recognizer. By describing your voice commands with voice2json's templating language, you get more than just transcriptions for free.
Re-training is fast enough to be done at runtime (usually < 5s), even up to millions of possible voice commands. This means you can change referenced slot values or add/remove intents on the fly.
All of the available commands are designed to work well in Unix pipelines, typically consuming/emitting plaintext or newline-delimited JSON. Audio input/output is file-based, so you can receive audio from any source.

Commands

download-profile - Download missing files for a profile
train-profile - Generate speech/intent artifacts
transcribe-wav - Transcribe WAV file to text
- Add --open for unrestricted speech to text
transcribe-stream - Transcribe live audio stream to text
- Add --open for unrestricted speech to text
recognize-intent - Recognize intent from JSON or text
wait-wake - Listen to live audio stream for wake word
record-command - Record voice command from live audio stream
pronounce-word - Look up or guess how a word is pronounced
generate-examples - Generate random intents
record-examples - Generate and record speech examples
test-examples - Test recorded speech examples
show-documentation - Run HTTP server locally with documentation
print-profile - Print profile settings
print-downloads - Print profile file download information
print-files - Print user profile files for backup

Name		Name	Last commit message	Last commit date
Latest commit History 384 Commits
bin		bin
debian		debian
docker		docker
docs		docs
etc		etc
m4		m4
recipes		recipes
scripts		scripts
tests		tests
voice2json		voice2json
.dockerignore		.dockerignore
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.projectile		.projectile
.python-version		.python-version
AUTHORS		AUTHORS
CHANGELOG		CHANGELOG
Dockerfile		Dockerfile
Dockerfile.debian		Dockerfile.debian
Dockerfile.test.debian		Dockerfile.test.debian
LICENSE		LICENSE
Makefile.in		Makefile.in
PKG-INFO		PKG-INFO
README.md		README.md
TODO.md		TODO.md
VERSION		VERSION
__main__.py		__main__.py
aclocal.m4		aclocal.m4
architecture.sh		architecture.sh
bootstrap.sh		bootstrap.sh
config.guess		config.guess
config.sub		config.sub
configure		configure
configure.ac		configure.ac
install-sh		install-sh
missing		missing
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
pylintrc		pylintrc
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py.in		setup.py.in
voice2json.sh.in		voice2json.sh.in
voice2json.spec.in		voice2json.spec.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Supported Languages

Unique Features

Commands

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

synesthesiam/voice2json

Folders and files

Latest commit

History

Repository files navigation

Supported Languages

Unique Features

Commands

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages