Multi-modal RAG pipeline

This is a working example of a multi-modal RAG QA application that lets users upload a PDF file and asks about the content of the file. The system return a textual answer with relevant images if available.

Author: Jan Čuhel

Date:May 2024

Architecture

Pre-processing

Inference

Installation

# Activate a python environment of your choice (e.g. venv, Conda)
# ...
# Install the dependencies
pip intall -r requirements.txt

Execution

python multimodal_rag_pipeline.py --source_file manual.pdf

Hardware requirements

We recommend to run this application on a device with a strong GPU as it utilizes several Deep Learning Models.

Intended use

The intended use of the system is for vehicle manuals, but it can be used for different manuals as well.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
README.md		README.md
extractor.py		extractor.py
multimodal_rag_pipeline.py		multimodal_rag_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal RAG pipeline

Architecture

Pre-processing

Inference

Installation

Execution

Hardware requirements

Intended use

About

Releases

Packages

Languages

HonzaCuhel/multimodal-rag-pipeline

Folders and files

Latest commit

History

Repository files navigation

Multi-modal RAG pipeline

Architecture

Pre-processing

Inference

Installation

Execution

Hardware requirements

Intended use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages