RAG-over-Audio-Data - Audio Transcript Processing and QA System

Introduction

This repository orchestrates a sophisticated pipeline for text processing, leveraging various libraries and modules. It begins by transcribing remote audio files, segmenting the text into manageable chunks, and embedding these chunks into a vector database for efficient retrieval. The transcription is facilitated using the AssemblyAI service, allowing for easy access to the content of the audio files.

The HuggingFaceEmbeddings module is employed to generate text embeddings using the HuggingFace models, enabling the conversion of text into numerical representations. These embeddings are then used to construct a vector database via Chroma, optimizing the storage and retrieval of text chunks.

The ChatOpenAI model, a heightened iteration of GPT-3, is utilized in a question-answering setup (RetrievalQA). This allows users to input questions, which are processed by the model to retrieve relevant answers from the text database created earlier. The retrieved answers are then presented along with the source documents containing relevant content, aiding transparency and context.

Overall, this code amalgamates audio transcription, text segmentation, text embedding, vector database creation, and advanced question-answering capabilities, providing a robust framework for handling text-based queries and interactions in an automated setup.

Components

Audio Transcription The code utilizes the AssemblyAIAudioTranscriptLoader to transcribe audio files from remote URLs into text.
Text Splitting The transcribed texts are split into smaller chunks using RecursiveCharacterTextSplitter to facilitate processing and analysis.
Text Embedding HuggingFace embeddings (HuggingFaceEmbeddings) are used to convert text chunks into embeddings. This allows for better semantic understanding and similarity calculations.
Vector Database Creation The embedded text chunks are stored in a vector database (Chroma) for efficient retrieval and comparison.
Question-Answering (QA) System The code implements a QA system using an advanced version of the GPT-3 model (ChatOpenAI). It retrieves relevant information from the vector database based on user queries.

Installation

Clone the repo

git clone https://github.com/wittyicon29/RAG-over-Audio-Data.git

Switch to the directory

cd RAG-over-Audio-Data

Create a virtual environment

python -m venv venv

Install the dependencies

pip install -r requirements.txt

Usage

Set up your environment variables in a .env file.

Transcribing Files: Provide remote audio file URLs in the URLs list. The provided URLs of the audio files must be publicily accessible so that AssemmbyAI API can access those audio files.

Run the script

python main.py

Configuration

Adjust the chunk size and overlap in RecursiveCharacterTextSplitter for text splitting customization.

Modify model names and parameters in the make_embedder() and make_qa_chain() functions to experiment with different language models and settings.

References

Retrieval Augumented Generation for Audio using Langchain

LICENSE

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.env example		.env example
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-over-Audio-Data - Audio Transcript Processing and QA System

Introduction

Components

Installation

Usage

Configuration

References

LICENSE

About

Releases

Packages

Languages

License

wittyicon29/RAG-over-Audio-Data

Folders and files

Latest commit

History

Repository files navigation

RAG-over-Audio-Data - Audio Transcript Processing and QA System

Introduction

Components

Installation

Usage

Configuration

References

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages