AudioDictate is an intuitive desktop application designed to transcribe audio content with high accuracy. It offers offline functionality for the transcription of WAV audio files, including the conversion of non-WAV formats into WAV. Featuring a straightforward graphical user interface, it provides a seamless experience for users to convert spoken language into text.
- Features
- Prerequisites
- Setup and Installation
- Running the Application
- How to Use
- Tools and Technologies
- Contributing
- License
- Contact
- Audio File Transcription: Converts spoken words from audio files into written text with high accuracy.
- WAV File Conversion: Automatically converts non-WAV files to WAV format for processing.
- Offline Functionality: Processes audio files offline, ensuring data privacy and security.
- Interactive GUI: Provides a user-friendly interface for file selection and displaying transcription results.
- Python 3.x
- Tkinter
- PyDub
- Vosk Speech Recognition Toolkit
git clone https://github.com/ascender1729/AudioDictate.git
cd AudioDictateCreate and activate a virtual environment:
python -m venv myenv
.\myenv\Scripts\Activate.ps1 # On Windows
source myenv/bin/activate # On Unix or MacOSpip install -r requirements.txtDownload the Vosk model appropriate for your language and note the directory path where it is saved.
python transcribe.pyWhen you are finished using AudioDictate, you can deactivate the virtual environment:
deactivate- Start AudioDictate.
- When prompted, input the directory path to the Vosk model.
- Use the application's interface to select your audio file. The application supports browsing and selecting the file directly within the app.
- If the selected file is not in WAV format, you'll be asked to select an output directory for the conversion process.
- After processing, the transcribed text will be displayed within the application window.
| Area | Tool/Technology | Description |
|---|---|---|
| Audio Processing | PyDub | Handles the audio file format conversion. |
| Speech Recognition | Vosk | Performs the speech-to-text transcription. |
| GUI | Tkinter | Provides the graphical user interface for the application. |
| Programming Language | Python | The core language used for developing the application. |
To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature). - Commit your changes (
git commit -m 'Add YourFeature'). - Push to the branch (
git push origin feature/YourFeature). - Create a new Pull Request.
Distributed under the MIT License. See LICENSE for more information.
Pavan Kumar - [email protected]
LinkedIn: linkedin.com/in/im-pavankumar
Project Link: AudioDictate