Skip to content

Commit d5f0d93

Browse files
v-bingyangrhurey
andauthored
Add Speech + LLM Sample. (#2704)
* Add Speech + LLM Sample. * Update README.md * Update premium_speech_demo.py * Update sample name. * Update readme. * Update. * Update README.md * Add *.ps1 text to .gitattributes * Fix: Set executable permission for app_manager.sh --------- Co-authored-by: Ryan Hurey <[email protected]>
1 parent 06d4fb9 commit d5f0d93

File tree

7 files changed

+328
-0
lines changed

7 files changed

+328
-0
lines changed

.gitattributes

+1
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ proguard-rules.pro text
6969
*.xml text
7070
*.yaml text
7171
*.yml text
72+
*.ps1 text
7273

7374

7475
# Bash only likes Unix line endings
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
{
2+
"version": "2.0.0",
3+
"tasks": [
4+
{
5+
"label": "Configuration and Setup",
6+
"type": "shell",
7+
"command": "/bin/bash",
8+
"args": [
9+
"-c",
10+
"chmod u+x ${workspaceFolder}/app_manager.sh && ${workspaceFolder}/app_manager.sh configure"
11+
],
12+
"group": {
13+
"kind": "build",
14+
"isDefault": false
15+
},
16+
"problemMatcher": [],
17+
"windows": {
18+
"command": "powershell",
19+
"args": [
20+
"-ExecutionPolicy",
21+
"Bypass",
22+
"-File",
23+
"${workspaceFolder}/app_manager.ps1",
24+
"configure"
25+
]
26+
}
27+
},
28+
{
29+
"label": "Run the App",
30+
"type": "shell",
31+
"command": "${workspaceFolder}/app_manager.sh",
32+
"args": [
33+
"run"
34+
],
35+
"group": {
36+
"kind": "none",
37+
"isDefault": false
38+
},
39+
"problemMatcher": [],
40+
"windows": {
41+
"command": "powershell",
42+
"args": [
43+
"-ExecutionPolicy",
44+
"Bypass",
45+
"-File",
46+
"${workspaceFolder}/app_manager.ps1",
47+
"run"
48+
]
49+
}
50+
}
51+
]
52+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Scenario: Continuous Speech Recognition and Rewriting via Azure OpenAI
2+
This project integrates Azure Cognitive Services Speech SDK with Azure OpenAI Service to perform real-time speech recognition and refine the recognized text for improved grammar and readability.
3+
4+
# Features
5+
1. Real-time speech-to-text transcription using Azure Cognitive Services Speech SDK.
6+
2. Automatic refinement of recognized text using Azure OpenAI Service.
7+
3. Grammar correction, minor rewrites for improved readability, and spelling fixes for predefined phrases.
8+
9+
## Run the Sample within VS Code
10+
1. Install "Azure AI Speech Toolkit" extension in VS Code.
11+
2. Download this sample from sample gallery to local machine.
12+
3. Trigger "Azure AI Speech Toolkit: Configure Azure Speech Resources" command from command palette to select an **Azure AI Service** resource.
13+
4. Trigger "Azure AI Speech Toolkit: Configure and Setup the Sample App" command from command palette to configure and setup the sample. This command only needs to be run once.
14+
5. Trigger "Azure AI Speech Toolkit: Run the Sample App" command from command palette to run the sample.
15+
16+
## Prerequisites
17+
- Install a version of [Python from 3.7 or later](https://www.python.org/downloads/).
18+
19+
## Environment Setup
20+
- Azure AI Speech Toolkit will automatically help you set these environment variables. If you want to run outside of VS Code, you can manually set the following environment variables.
21+
22+
- `SPEECH_REGION`: Azure region for the Speech Service (e.g., `eastus`).
23+
- `SPEECH_KEY`: Azure Cognitive Services Speech API key.
24+
- `AZURE_OPENAI_ENDPOINT`: Endpoint for Azure OpenAI Service (e.g., `https://<your-resource-name>.openai.azure.com`).
25+
- `AZURE_OPENAI_API_KEY`: API key for Azure OpenAI Service.
26+
27+
When running the sample app, you can set --relevant_phrases parameter.
28+
- `--relevant_phrases`: (Optional) Default: Azure Cognitive Services, non-profit organization, speech recognition, OpenAI API
29+
30+
----
31+
32+
## Example Output
33+
Speak into the microphone. The sample application will print both the recognition result and the rewritten version.
34+
For instance, if you speak "how ar you" into the microphone, the output will be:
35+
36+
```
37+
RAW RECO: how ar you
38+
REWRITE: How are you?
39+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
import os
2+
import argparse
3+
import azure.cognitiveservices.speech as speechsdk
4+
from openai import AzureOpenAI
5+
6+
# Initialize speech recognition engine
7+
service_region = os.environ.get('SPEECH_REGION')
8+
speech_key = os.environ.get('SPEECH_KEY')
9+
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
10+
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, language="en-us")
11+
12+
# Initialize Azure OpenAI client
13+
client = AzureOpenAI(
14+
azure_endpoint=os.environ.get('AZURE_OPENAI_ENDPOINT'),
15+
api_key=os.environ.get('AZURE_OPENAI_API_KEY'),
16+
api_version="2024-10-21"
17+
)
18+
19+
# Parse command-line arguments
20+
parser = argparse.ArgumentParser(description="Run app.py with custom parameters.")
21+
parser.add_argument(
22+
"--relevant_phrases",
23+
type=str,
24+
default="Azure Cognitive Services, non-profit organization, speech recognition, OpenAI API",
25+
help="Comma-separated relevant phrases for text rewriting."
26+
)
27+
args = parser.parse_args()
28+
29+
# Use user-provided or default relevant_phrases
30+
relevant_phrases = args.relevant_phrases
31+
32+
def rewrite_content(input_reco):
33+
"""
34+
Refines the user's input sentence by fixing grammar issues, making it more readable,
35+
and ensuring spelling correctness for specific phrases.
36+
37+
Args:
38+
input_reco (str): The raw input sentence to rewrite.
39+
40+
Returns:
41+
str: The refined sentence.
42+
"""
43+
44+
# A list of phrases relevant to the context, used to ensure their correct spelling and formatting.
45+
# Users can customize these phrases based on their specific use case or domain.
46+
relevant_phrases = args.relevant_phrases
47+
48+
my_messages = [
49+
{
50+
"role": "system",
51+
"content": (
52+
"You are a helpful assistant to help the user rewrite sentences. "
53+
"Please fix the grammar errors in the user-provided sentence and make it more readable. "
54+
"You can do minor rewriting but MUST NOT change the sentence's meaning. "
55+
"DO NOT make up new content. DO NOT answer questions. "
56+
"Here are phrases relevant to the sentences: '{}'. "
57+
"If they appear in the sentence and are misspelled, please fix them. "
58+
"Example corrections:\n"
59+
"User: how ar you\nYour response: How are you?\n\n"
60+
"User: what yur name?\nYour response: What's your name?\n\n"
61+
).format(relevant_phrases)
62+
},
63+
{"role": "user", "content": input_reco}
64+
]
65+
66+
response = client.chat.completions.create(
67+
model="gpt-4o-mini",
68+
messages=my_messages
69+
)
70+
71+
return response.choices[0].message.content
72+
73+
def recognized_cb(evt: speechsdk.SpeechRecognitionEventArgs):
74+
"""
75+
Callback function triggered when speech is recognized.
76+
77+
Args:
78+
evt (SpeechRecognitionEventArgs): The event argument containing recognized text.
79+
"""
80+
current_sentence = evt.result.text
81+
if not current_sentence:
82+
return
83+
84+
print("RAW RECO:", current_sentence)
85+
print("REWRITE:", rewrite_content(current_sentence))
86+
87+
# Connect the speech recognizer to the callback
88+
speech_recognizer.recognized.connect(recognized_cb)
89+
result_future = speech_recognizer.start_continuous_recognition_async()
90+
result_future.get() # Ensure engine initialization is complete
91+
92+
print('Continuous Recognition is now running. Say something.')
93+
while True:
94+
print('Type "stop" then press Enter to stop recognition.')
95+
stop = input()
96+
if stop.lower() == "stop":
97+
print('Stopping async recognition...')
98+
speech_recognizer.stop_continuous_recognition_async()
99+
break
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
param(
2+
[string]$action
3+
)
4+
5+
function Test-PythonInstalled {
6+
return Get-Command python -ErrorAction SilentlyContinue
7+
}
8+
9+
function Test-PipInstalled {
10+
return Get-Command pip -ErrorAction SilentlyContinue
11+
}
12+
13+
if ($action -eq "configure") {
14+
if (-not (Test-PythonInstalled)) {
15+
Write-Host "Python is not installed. Please install Python to proceed." -ForegroundColor Red
16+
exit 1
17+
}
18+
19+
if (-not (Test-PipInstalled)) {
20+
Write-Host "pip is not installed. Please install pip to proceed." -ForegroundColor Red
21+
exit 1
22+
}
23+
24+
Write-Host "Installing requirements packages..."
25+
try {
26+
pip install -r requirements.txt
27+
Write-Host "Requirements packages installation succeeded." -ForegroundColor Green
28+
}
29+
catch {
30+
Write-Host "Requirements packages installation failed. Please check your pip installation." -ForegroundColor Red
31+
exit 1
32+
}
33+
}
34+
elseif ($action -eq "run") {
35+
# Define the path to your .env file
36+
$envFilePath = ".env/.env.dev"
37+
38+
if (Test-Path $envFilePath) {
39+
# Read each line of the file and process it
40+
Get-Content -Path $envFilePath | ForEach-Object {
41+
# Ignore empty lines and lines that start with `#` (comments)
42+
if ($_ -and $_ -notmatch '^\s*#') {
43+
# Split each line into key and value
44+
$parts = $_ -split '=', 2
45+
$key = $parts[0].Trim()
46+
$value = $parts[1].Trim()
47+
48+
# Set the environment variable
49+
[System.Environment]::SetEnvironmentVariable($key, $value)
50+
}
51+
52+
[System.Environment]::SetEnvironmentVariable("SPEECH_KEY", $env:SPEECH_RESOURCE_KEY)
53+
[System.Environment]::SetEnvironmentVariable("AZURE_OPENAI_API_KEY", $env:SPEECH_RESOURCE_KEY)
54+
[System.Environment]::SetEnvironmentVariable("SPEECH_REGION", $env:SERVICE_REGION)
55+
[System.Environment]::SetEnvironmentVariable("AZURE_OPENAI_ENDPOINT", "https://$env:CUSTOM_SUBDOMAIN_NAME.openai.azure.com/")
56+
}
57+
58+
Write-Host "Environment variables loaded from $envFilePath"
59+
}
60+
else {
61+
Write-Host "File not found: $envFilePath. You can create one to set environment variables or manually set secrets in environment variables."
62+
}
63+
64+
$relevantPhrases = Read-Host "Enter relevant phrases (or press Enter to use defaults)"
65+
if ([string]::IsNullOrEmpty($relevantPhrases)) {
66+
$relevantPhrases = "Azure Cognitive Services, non-profit organization, speech recognition, OpenAI API"
67+
}
68+
Write-Host "Running app.py with relevant phrases: $relevantPhrases"
69+
python app.py --relevant_phrases "$relevantPhrases"
70+
}
71+
else {
72+
Write-Host "Invalid action: $action" -ForegroundColor Red
73+
Write-Host "Usage: -action configure or -action run"
74+
exit 1
75+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
#!/bin/bash
2+
3+
action=$1
4+
5+
function check_python_installed() {
6+
command -v python >/dev/null 2>&1
7+
}
8+
9+
function check_pip_installed() {
10+
command -v pip >/dev/null 2>&1
11+
}
12+
13+
if [ "$action" == "configure" ]; then
14+
echo "Installing Linux platform required dependencies..."
15+
sudo apt-get update
16+
sudo apt-get install -y build-essential libssl-dev libasound2 wget
17+
18+
if ! check_python_installed; then
19+
echo -e "\e[31mPython is not installed. Please install Python to proceed.\e[0m"
20+
exit 1
21+
fi
22+
23+
if ! check_pip_installed; then
24+
echo -e "\e[31mpip is not installed. Please install pip to proceed.\e[0m"
25+
exit 1
26+
fi
27+
28+
echo "Installing requirements packages..."
29+
if ! pip install -r requirements.txt; then
30+
exit 1
31+
fi
32+
elif [ "$action" == "run" ]; then
33+
34+
# Load environment variables from .env file
35+
ENV_FILE=".env/.env.dev"
36+
if [ -f "$ENV_FILE" ]; then
37+
source "$ENV_FILE"
38+
39+
# Ensure environment variables are available to the C++ binary
40+
export SPEECH_KEY=$SPEECH_RESOURCE_KEY
41+
export AZURE_OPENAI_API_KEY=$SPEECH_RESOURCE_KEY
42+
export SPEECH_REGION=$SERVICE_REGION
43+
export AZURE_OPENAI_ENDPOINT="https://${CUSTOM_SUBDOMAIN_NAME}.openai.azure.com/"
44+
echo "Environment variables loaded from $ENV_FILE"
45+
46+
else
47+
echo "Environment file $ENV_FILE not found. You can create one to set environment variables or manually set secrets in environment variables."
48+
fi
49+
50+
read -p "Enter relevant phrases (or press Enter to use defaults): " relevant_phrases
51+
if [ -z "$relevant_phrases" ]; then
52+
relevant_phrases="Azure Cognitive Services, non-profit organization, speech recognition, OpenAI API"
53+
fi
54+
echo "Running app.py with relevant phrases: $relevant_phrases"
55+
python app.py --relevant_phrases "$relevant_phrases"
56+
else
57+
echo -e "\e[31mInvalid action: $action\e[0m"
58+
echo "Usage: $0 configure or $0 run"
59+
exit 1
60+
fi
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
azure-cognitiveservices-speech
2+
openai

0 commit comments

Comments
 (0)