Allow singleUtterance: false #63

rhclayto · 2017-09-24T20:53:37Z

Setting singleUtterance: true in recognizer.createRecognizeStream leaves it to Google to detect when an utterance has stopped & end the recognition stream. A problem I encountered is that often Google is too eager to stop the detection, i.e., it detects slight pauses in speech as an end of utterance. This is good for detecting short command-type utterances, such as 'turn off light', etc., but for longer free-form dictation it's problematic.

Adding an option to Sonus to allow configurable singleUtterance would give flexibility to developers to implement their own end-of-utterance detection & stream teardown. So I guess this is a feature request.

But I also wanted to post this here because I made some changes to my fork of Sonus to allow this, & thought I'd share it, even though it's got some stuff specific to my use hard coded into & is therefore not pull-request-worthy. But it might contain the seed of something that can be put into Sonus, if desired.

CloudSpeechRecognizer.startStreaming = (options, audioStream, cloudSpeechRecognizer) => {
// . . .
  const recognitionStream = recognizer.createRecognizeStream({
      // . . .
      singleUtterance: options.noSingleUtterance ? false : true,
      // . . .
  })
  // . . .
  // Add recognitionStream to the cloudSpeechRecognizer object so it can be shut down from
  // Sonus if the noSingleUtterance option has been passed in.
  if (options.noSingleUtterance) cloudSpeechRecognizer.recognitionStream = recognitionStream; 
 // . . .
Sonus.init = (options, recognizer) => {
  // . . .
  sonus.trigger = (index, hotword) => {
    // . . .
    let triggerHotword = (index == 0) ? hotword : models.lookup(index)
    // If trigger hotword is 'FreeDescription', set an option to send to CloudSpeechRecognizer
    // so that it will start a stream with singleUtterance: false, & we'll handle our own silence 
    // detection & recognitionStream teardown.
    if (triggerHotword === 'FreeDescription') opts.noSingleUtterance = true;
    // . . .
  // . . .
  // Add a sonus.on listener to receive events from an instantiated sonus in order to shut down 
  //  a recognitionStream.
  sonus.on('recognitionStreamShutdown', function() {
    // recognitionStream will be made available on the csr object, so we can shut it down as follows:
    if (csr.listening && csr.recognitionStream) {
      csr.listening = false;
      sonus.mic.unpipe(csr.recognitionStream);
      csr.recognitionStream.end();
      delete csr.recognitionStream;
    }
  });
}

These changes make it possible to configure singleUtterance to true or false based on the hotword used to trigger Sonus. They also make it possible to stop the recognition stream by emitting a recognitionStreamShutdown event from somewhere in an app. So now deciding when an utterance is over & when to stop the recognition stream with Google is in the developers hands. Developers will still need to handle the fact that Google is going to issue isFinal properties on its results when it thinks the utterance is over, so if you want your transcription to span the length of your configured timeout period, you'll need to concatenate the various isFinal results that Google issues.

As an example, I used the silence & sound events emitted by Sonus (as determined by the Snowboy detector), along with a setTimeout, to determine when an utterance had ended (i.e., after a certain amount of sustained silence, the utterance is determined to be over). Here's the code I used:

// Set Snowboy hotwords.
var hotwords = [{file: '/home/benja/app/ListenRobot.pmdl', hotword: 'ListenRobot', sensitivity: '0.5'}, {file: '/home/benja/app/FreeDescription.pmdl', hotword: 'FreeDescription', sensitivity: '0.5'}];
// Create an instance of Sonus.
var sonusLanguage = 'en-US';
var sonus = Sonus.init({hotwords, sonusLanguage, recordProgram: 'arecord', device: 'bluealsa:HCI=hci0,DEV=00:6A:8E:16:C5:F2,PROFILE=sco'}, speech);
// Start the Sonus instance.
Sonus.start(sonus);
var silenceTimeout = null, bufferSonus = false, sonusBuffer = '';
// Event: When Snowboy informs us a hotword has been detected.
sonus.on('hotword', function(index, keyword) {
  console.log('!');
  if (keyword === 'FreeDescription') bufferSonus = true;
});
// Event: When Snowboy detects silence.
sonus.on('silence', function() {
  // After an appropriate period of uninterrupted silence, send an event to sonus
  //  to tear down the recognitionStream, but only while streaming from a 
  //  'FreeDescription' trigger hotword.
  if (!silenceTimeout) silenceTimeout = setTimeout(function() {
    if (bufferSonus && sonusBuffer !== '') {
    console.log('Complete sonusBuffer flushed: ', sonusBuffer);
    Sonus.annyang.trigger('sonusBuffer ' + sonusBuffer);
    sonusBuffer = '';
    bufferSonus = false;
   }
   sonus.emit('recognitionStreamShutdown');
  }, 4300);
});
// Event: When Snowboy detects sound.
sonus.on('sound', function() {
  clearTimeout(silenceTimeout);
  silenceTimeout = null;
});
// Event: When a final transcript has been received from Google Cloud Speech.
sonus.on('final-result', function(result) {
  console.log(result);
  if (bufferSonus) sonusBuffer += result;
});

Perhaps this can help someone.

The text was updated successfully, but these errors were encountered:

evancohen · 2017-09-28T05:07:19Z

Thanks for the super detailed suggestion! This is a good idea :) I'm working on a refactor of Sonus that allows for more cloud recognizers and a bit more control of their configuration, which should address this.

Another thought I've had is to add the ability to change configuration after a hotword is detected, but before speech begins streaming (essentially a synchronous config update via a callback function that gets passed to the 'hotword' event)

mrdago · 2017-10-05T03:55:30Z

@rhclayto - That's exactly what I'm looking for!
I've successful implemented sonus into my house automation solution and it works perfect (reliable and fast) detecting and translating short commands to control my devices.
Now I'm planning to use sonus for creating messages "yellow stickers" and to display these messages on my family dashboard (MagicMirror wo mirror). But this requires a bit more control of the Google API to optional switch off "interimResults" or to influence the singleUtterance parameter.
As I'm a novice in programming I will wait on evancohen sonus source code modifications.
@evancohen - Great work, thank you for this piece of software!

mrdago · 2017-10-07T21:13:27Z

@evancohen
I've installed [email protected]. and additionally extended index.js with the code proposed by @rhclayto to be able to process longer text phrases with the Google speech API.
It works reliable and stable and I would be pleased if the proposal from @rhclayto would be part of the next sonus release. To be more flexible I propose to use the second hotword to switch to opts.noSingleUtterance = true (see my code snipsel below).

sonus.trigger = (index, hotword) => {
    if (sonus.started) {
      try {
        let triggerHotword = (index == 0) ? hotword : models.lookup(index)
        // *** added
        // If there are 2 hotwords configured, set an option to send to CloudSpeechRecognizer
        // so that it will start a stream with singleUtterance: false, & we'll handle our own silence 
        // detection & recognitionStream teardown.
        opts.noSingleUtterance = false
        if(models.lookupTable.length > 1){
          if (triggerHotword === models.lookup(2)) opts.noSingleUtterance = true
        }
        // ***
        sonus.emit('hotword', index, triggerHotword)
        CloudSpeechRecognizer.startStreaming(opts, sonus.mic, csr)
      } catch (e) {
        throw ERROR.INVALID_INDEX
        console.log(e)

evancohen · 2017-10-07T21:49:41Z

Awesome :) I'll definitely integrate this into the next release. One change though: noSingleUtterace should be a parameter on the hotword - it shouldn't be hard-coded for a specific index. I'm traveling this week but next week I'll release the update (unless you want to send a PR for it)

MrSponti · 2018-04-29T06:02:13Z

Hi Evan, yesterday I checked out your latest release and tried to implement the discussed option for Sonus to allow configurable singleUtterance. I've that realized and running stable in sonus 0,1.9.-1 but my knowledge is not sufficient to implement it in your last release and I would be pleased if you could help. The proposed change is an addon and will not impact the existing functionality.

The approach I've made in v0.1.9-1 was to define an option longText with the index of the hotword to use to switch to singleUtterance = false.

const longText = 2;      // index pointing to hotword for use of 'long text form' in Google Speech API
const sonus = Sonus.init({ hotwords, language, audioGain, longText, recordProgram: "arecord" }, speech);

And here are the modification (*** added) in the index.js:

  const recognitionStream = recognizer.streamingRecognize({
    config: {
      encoding: 'LINEAR16',
      sampleRateHertz: 16000,
      languageCode: options.language,
      speechContexts: options.speechContexts || null
    },
    singleUtterance: options.noSingleUtterance ? false : true,        // *** changed
    interimResults: true,
  })
  // *** added
  //console.log('options.noSingleUtterance='+options.noSingleUtterance)
  if (options.noSingleUtterance){                               
    cloudSpeechRecognizer.recognitionStream = recognitionStream
  }
  // ***

...  
opts.language = opts.language || 'en-US' //https://cloud.google.com/speech/docs/languages
  // *** added
  opts.longText = opts.longText || 0
  // ***  

...

sonus.trigger = (index, hotword) => {
    if (sonus.started) {
      try {
        let triggerHotword = (index == 0) ? hotword : models.lookup(index)
        // *** added
        // If value of 'opt.longText' is the index to the actual hotword, set an option to start 
        // a stream with singleUtterance: false, and we'll handle our own silence detection & recognitionStream teardown.
        opts.noSingleUtterance = false
        if(opts.longText > 0 && models.lookupTable.length >= opts.longText){
          if (triggerHotword === models.lookup(opts.longText)) opts.noSingleUtterance = true
        }
        // ***        
        sonus.emit('hotword', index, triggerHotword)
        CloudSpeechRecognizer.startStreaming(opts, sonus.mic, csr)

...
  // ***
  // Add a sonus.on listener to receive events from an instantiated sonus in order to shut down 
  //  a recognitionStream.
  sonus.on('recognitionStreamShutdown', function() {
    // recognitionStream will be made available on the csr object, so we can shut it down as follows:
    if (csr.listening && csr.recognitionStream) {
      csr.listening = false
      sonus.mic.unpipe(csr.recognitionStream)
      csr.recognitionStream.end()
      delete csr.recognitionStream
    }
  })
  // ***
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow singleUtterance: false #63

Allow singleUtterance: false #63

rhclayto commented Sep 24, 2017 •

edited

Loading

evancohen commented Sep 28, 2017

mrdago commented Oct 5, 2017

mrdago commented Oct 7, 2017

evancohen commented Oct 7, 2017

MrSponti commented Apr 29, 2018

Allow singleUtterance: false #63

Allow singleUtterance: false #63

Comments

rhclayto commented Sep 24, 2017 • edited Loading

evancohen commented Sep 28, 2017

mrdago commented Oct 5, 2017

mrdago commented Oct 7, 2017

evancohen commented Oct 7, 2017

MrSponti commented Apr 29, 2018

rhclayto commented Sep 24, 2017 •

edited

Loading