A real-time voice chatbot based on foster whisper and ollama
This script provides a local audio recording and transcription service without relying on cloud services, making it suitable for offline use.
Before running the script, you need to install the following dependencies:
pip3 install pyaudio webrtcvad faster-whisper
To run the script, use the following command:
python3 faster_whisper_demo.py
This script records audio from the microphone, transcribes it using a local model, and provides real-time transcription services. Below is an overview of the main components:
- Manages two queues:
audio
for storing audio data andtext
for storing transcribed text.
- Utilizes the FasterWhisper model for audio transcription.
- Initializes the model with specified parameters such as model size, device, and compute type.
- Processes audio data and yields transcribed text segments.
- Records audio using the PyAudio library and processes it with WebRTC VAD (Voice Activity Detection).
- Initializes the audio stream and VAD, and manages the recording and buffering of audio frames.
- Detects speech activity and manages the transition between recording and silence.
- Handles communication with the Ollama chatbot.
- Sends transcribed text to the chatbot and receives responses.
- Initializes and starts the audio recorder, transcriber, and chat components.
- Manages the lifecycle of these components and handles exceptions.
-
Audio Recording:
- The script continuously records audio from the microphone.
- When speech is detected, it starts recording, and when silence is detected, it stops and processes the audio.
-
Audio Transcription:
- The recorded audio is sent to the
Transcriber
class, which uses the FasterWhisper model to transcribe the audio into text.
- The recorded audio is sent to the
-
Chat Interaction:
- The transcribed text is sent to the
Chat
class, which communicates with the Ollama chatbot and retrieves responses.
- The transcribed text is sent to the
- The script uses the
logging
module to log various stages of the recording, transcription, and chat processes. - Log messages provide information on the status and any errors encountered during the execution.
- Transcribed text is logged in real-time, and responses from the chatbot are printed to the console.
- The script includes error handling to manage exceptions during initialization, audio recording, transcription, and chat communication.
- Sets the environment variable
KMP_DUPLICATE_LIB_OK
to "TRUE" to resolve potential issues with library conflicts.
- Ensure that your microphone is properly configured and accessible by the script.
- The script is designed to work with specific models and configurations. Adjust the parameters as needed for your environment.
This project is licensed under the MIT License - see the LICENSE file for details.