This is a working example of a multi-modal RAG QA application that lets users upload a PDF file and asks about the content of the file. The system return a textual answer with relevant images if available.
Author: Jan Čuhel
Date:May 2024
# Activate a python environment of your choice (e.g. venv, Conda)
# ...
# Install the dependencies
pip intall -r requirements.txt
python multimodal_rag_pipeline.py --source_file manual.pdf
We recommend to run this application on a device with a strong GPU as it utilizes several Deep Learning Models.
The intended use of the system is for vehicle manuals, but it can be used for different manuals as well.