Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DialogServiceConnector doesn't seem to support MULAW as output format #48212

Open
AbhishkeKatoch opened this issue Feb 11, 2025 · 1 comment
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Speech customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@AbhishkeKatoch
Copy link

AbhishkeKatoch commented Feb 11, 2025

Library name and version

Microsoft.CognitiveServices.Speech 1.42.0

Describe the bug

We are working on an integration where we are getting the audio in MULAW format(8K samples/sec, 8 bits/sample, 1 channel) form a socket connection. We also have to write back the audio in the same format.

We are using SpeechRecognizer for transcribing the input audio. The audio is getting transcribed correctly.

// VoiceAudioStream is extending PullAudioInputStreamCallback
var inputAudioStream = new VoiceAudioStream();
var inputAudioConfig = AudioConfig.FromStreamInput(inputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var speechRecognizer = new SpeechRecognizer(speechConfig, inputAudioConfig);
// Register speechRecognizer events

We then feeding the transcribed text to Speech Service using the DialogServiceConnector. We are initiating the DialogServiceConnector as follows

// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();

The problem is, When i am writing back the audio output from the Speech Service in MULAW format to the socket connection. The audio at the other end of the socket connection is complete noise.

One work around to this situation i have found is get the output in PCM format(16K samples, 16 bits/sample, 1 channel) from the Speech Service and convert it to MULAW format before writing back to the socket connect. Following is the working code.

// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormatPCM(16000, 16, 1));
var connector = new DialogServiceConnector(SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);, outputAudioConfig);
await connector.ConnectAsync();

Expected behavior

// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();

Initializing DialogServiceConnector in above way should output audio in MULAW format and the output audio should not produce noise when stream over the socket.

Actual behavior

// VoiceAudioStream is extending PullAudioInputStreamCallback
var outputAudioConfig = AudioConfig.FromStreamInput(outputAudioStream, AudioStreamFormat.GetWaveFormat(8000, 8, 1, AudioStreamWaveFormat.MULAW));
var speechConfig = SpeechConfig.FromSubscription(AppSettings.CognitiveSubscriptionKey, AppSettings.CognitiveSubscriptionRegion);
var connector = new DialogServiceConnector(speechConfig, outputAudioConfig);
await connector.ConnectAsync();

Initializing DialogServiceConnector in above way outputs the audio which is resulting as a total noise at the other end of the socket connection.

Reproduction Steps

Try to get audio in MULAW format from the speech service using the DialogServiceConnector.

Environment

Windows 11 Pro .NET 8.0
Microsoft Visual Studio Professional 2022 (64-bit) - Current
Version 17.9.7

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 11, 2025
@jsquire jsquire added Service Attention Workflow: This issue is responsible by Azure service team. Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Cognitive - Speech and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 11, 2025
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Speech customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

2 participants