[BUG] DialogServiceConnector doesn't seem to support MULAW as output format #48212
Labels
Client
This issue points to a problem in the data-plane of the library.
Cognitive - Speech
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
needs-team-attention
Workflow: This issue needs attention from Azure service team or SDK team
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Service Attention
Workflow: This issue is responsible by Azure service team.
Library name and version
Microsoft.CognitiveServices.Speech 1.42.0
Describe the bug
We are working on an integration where we are getting the audio in MULAW format(8K samples/sec, 8 bits/sample, 1 channel) form a socket connection. We also have to write back the audio in the same format.
We are using SpeechRecognizer for transcribing the input audio. The audio is getting transcribed correctly.
// VoiceAudioStream is extending PullAudioInputStreamCallback
var inputAudioStream = new VoiceAudioStream();
var inputAudioConfig = AudioConfig.FromStreamInput(inputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var speechRecognizer = new SpeechRecognizer(speechConfig, inputAudioConfig);
// Register speechRecognizer events
We then feeding the transcribed text to Speech Service using the DialogServiceConnector. We are initiating the DialogServiceConnector as follows
// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();
The problem is, When i am writing back the audio output from the Speech Service in MULAW format to the socket connection. The audio at the other end of the socket connection is complete noise.
One work around to this situation i have found is get the output in PCM format(16K samples, 16 bits/sample, 1 channel) from the Speech Service and convert it to MULAW format before writing back to the socket connect. Following is the working code.
// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormatPCM(16000, 16, 1));
var connector = new DialogServiceConnector(SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);, outputAudioConfig);
await connector.ConnectAsync();
Expected behavior
// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();
Initializing DialogServiceConnector in above way should output audio in MULAW format and the output audio should not produce noise when stream over the socket.
Actual behavior
// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();
Initializing DialogServiceConnector in above way outputs the audio which is resulting as a total noise at the other end of the socket connection.
Reproduction Steps
Try to get audio in MULAW format from the speech service using the DialogServiceConnector.
Environment
Windows 11 Pro .NET 8.0
Microsoft Visual Studio Professional 2022 (64-bit) - Current
Version 17.9.7
The text was updated successfully, but these errors were encountered: