This repository contains the source code for my Final Year Project (FYP) focused on an end-to-end framework for knowledge retrieval from visually rich documents, with applications in the venture capital domain.
An End-to-end Framework for Knowledge Retrieval from Visual Documents with Applications in Venture Capital
By
Koh Quan Wei Ivan
Department of Information Systems and Analytics
School of Computing
National University of Singapore
2023/2024
If you use this work in your research or project, please cite:
Koh, Q. W. I. (2024). An End-to-end Framework for Knowledge Retrieval from Visual Documents with Applications in Venture Capital. B.Comp. Dissertation, Department of Information Systems and Analytics, School of Computing, National University of Singapore.
For any inquiries, please contact: [email protected]
This dissertation presents an end-to-end framework for knowledge retrieval from visually rich documents, focusing on applications in the venture capital domain. The framework addresses the challenges faced by venture capital professionals in manually analysing pitch decks during the deal-sourcing process. It comprises three phases: Detection, Extraction, and Synthesis.
The Detection Phase introduces the IIIT-OSV-Charts dataset, a novel combination of the IIIT-AR-13K dataset and a proprietary dataset from Openspace Ventures. State-of-the-art YOLO object detection models are employed to accurately identify and localize chart instances within pitch decks.
In the Extraction Phase, the Set-of-Marks prompting strategy is adapted for grounded zero-shot understanding of charts using large multimodal models. Relevant insights are extracted and stored in a vector database for efficient retrieval.
The Synthesis Phase develops a Retrieval-Augmented Generation pipeline tailored to generate comprehensive responses to frequently asked questions crucial for decision-making. This pipeline is integrated with a user-friendly web application.
- Development of a three-phase approach: Detection, Extraction, and Synthesis
- Creation of the IIIT-OSV-Charts dataset for chart detection in venture capital documents
- Adaptation of the Set-of-Marks prompting strategy for chart understanding
- Development of a tailored Retrieval-Augmented Generation (RAG) pipeline
- Integration with a user-friendly web application for multi-turn conversations
The current implementation relies heavily on closed-source models from providers such as OpenAI and Anthropic, which poses risks in terms of reliability and scalability. Future work should focus on reducing dependence on closed-source models and exploring alternative solutions.
gcloud auth login
gcloud config set project $GCP_PROJECT_ID
cd frontend
npm run dev
cd backend
python main.py
cd backend
source vector_db_start.sh
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant
cd backend
docker build -t fyp-backend .
docker run -p 80:80 \
-e OPENAI_API_TOKEN="" \
-e APP_HTTP_HOST="127.0.0.1" \
-e APP_HTTP_PORT="8000" \
-e APP_HTTP_URL="http://${APP_HTTP_HOST}:${APP_HTTP_PORT}" \
-e PGURL="" \
-e QDRANT_URL="" \
-e QDRANT_API_KEY="" \
fyp-backend
docker tag fyp-backend gcr.io/$GCP_PROJECT_ID/fyp-backend:v1
docker push gcr.io/$GCP_PROJECT_ID/fyp-backend:v1