Skip to content

Efficient Speech Recognition and Language Model Integration for Enhanced Textual Interaction

Notifications You must be signed in to change notification settings

Gary0232/AudioTranscribeAI

Repository files navigation

AudioTranscribeAI

Vue3 Vuetify3 Python Node.js Yarn Flask sqlite3

Installation

This project is working with Python 3.8.8+, you should have it installed on your machine.

Set up the backend

Install the requirements

pip install -r requirements.txt

Set up the frontend

The frontend is inside the frontend/AudioTranscribeAI folder.

The application is built with node.js and vue.js.

So to start the server, you need to install the node.js and yarn package manager.

npm install --global yarn

Install the requirements

# if you are in the root directory
yarn --cwd frontend/AudioTranscribeAI/  
# or
cd frontend/AudioTranscribeAI 
# if you are in the frontend/AudioTranscribeAI directory
yarn 

Run the application

Because we are using separate backend and frontend architecture, so we need to run both of them.

Run the backend

python app.py

Run the frontend

# if you are in the root directory
yarn --cwd frontend/AudioTranscribeAI/ dev 
# or
cd frontend/AudioTranscribeAI 
# if you are in the frontend/AudioTranscribeAI directory
yarn dev 

Other commands for the frontend

Build the release version of frontend application

yarn --cwd frontend/AudioTranscribeAI/ build # if you are in the root directory
# or
cd frontend/AudioTranscribeAI 
yarn build # if you are in the frontend/AudioTranscribeAI directory

Architecture

Architecture

Machine Learning Model

framework.png

Speech Recognition

✨Model: openai/whisper-small

Large Language Model (Text Summarization and Question Answering)

✨Model:

NLP Model

✨Model spacy: en_core_web_sm

Wikipedia Retrieval

Backend

Frontend

Evaluation

For aduio model evaluation, we use the LibriSpeech dataset.

cd asr && python eval_asr.py

For the LLM evaluation on summarization, direct use the llm.py:

python llm/llm.py

Results

TinyLlama.png

whisper-small.png

About

Efficient Speech Recognition and Language Model Integration for Enhanced Textual Interaction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published