GAIA Benchmark Validation Application

Abstract

This project builds a validation tool for the GAIA dataset using a multi-page Streamlit interface. It allows users to randomly select and evaluate test cases, query different LLM models via the OpenAI API, and compare model responses with expected answers. Users can annotate incorrect responses and re-evaluate the model’s performance. A metrics dashboard displays the overall performance, and all interactions are tracked in a database for reporting and analytics.

Checklist of Deliverables

Diagrams illustrating system architecture
Fully documented CodeLab
5-minute video submission
Link to a working Streamlit application
GitHub repository with project files and documentation

Architecture

The architecture consists of the following components:

Data Sources:
- Hugging Face: Provides the GAIA dataset for validation and testing.
- GCP (Google Cloud Platform): Handles storage via GCS (Google Cloud Storage) buckets and stores metadata in a PostgreSQL database.
Infrastructure:
- Terraform: Automates the setup of GCP resources such as databases and buckets.
Streamlit Application:
- Multi-page UI with the following pages:
  - Landing Page: Documentation and introduction to the application.
  - LLM Prompting Page: Allows users to select a random task, submit a query to OpenAI models, and interact with responses.
  - Metrics Dashboard: Provides visualizations and reports on model performance.
LLM Interaction:
- OpenAI API: Queries are sent to OpenAI models for LLM responses, which are stored for comparison and further processing.
Database:
- PostgreSQL: Stores task metadata, LLM responses, annotations, and interaction logs for analysis.
Storage:
- GCS (Google Cloud Storage): Manages file uploads, including task-related files fetched during LLM processing.

Tech Stack

Streamlit: For the multi-page UI and interaction.
Hugging Face: Source for GAIA dataset.
OpenAI API: Used for querying different language models.
Google Cloud Platform (GCS, PostgreSQL): Storage and database.
Terraform: Infrastructure management.
Python: For backend logic and integration with various APIs.

Dataset

GAIA Dataset: Available via Hugging Face, the GAIA dataset contains structured metadata and test cases. It forms the core of the benchmarking process.

Data Storage

Files and task-related data are stored in Google Cloud Storage (GCS) buckets.
PostgreSQL on GCP is used for storing metadata, LLM responses, and user annotations.

Features

Landing Page:

Basic documentation and overview of the application.

LLM Prompting Page:

Pick a random task from the GAIA dataset.
Select from different LLM models (default to one OpenAI model).
Fetch associated files from GCP and provide context for the query.
Submit the question along with the context to OpenAI and view the LLM response.
Annotate responses and re-submit the task for evaluation.
Save responses to the database, either as-is or after annotation.

Metrics Dashboard:

Visualize performance metrics, including model accuracy, response quality, and interaction history.

Here's an updated README section with instructions for installing Poetry before continuing with the rest of the setup steps:

Project Installation Guide

Prerequisites

Ensure that you have the following installed on your system:

Python
Terraform
GIT
GCP Account

1. Clone the Repository

Now, clone the repository to your local machine:

git clone https://github.com/DAMG7245-Big-Data-Sys-SEC-02-Fall24/Assignment-1-GAIA.git
cd Assignment-1-GAIA

2. Configure Secrets

To configure sensitive information like GCP credentials, OpenAI API key, and database settings:

Replace the placeholders (xxxxxxx) with your actual credentials in secrets.toml

3. Terraform Infrastructure Setup

If your project involves infrastructure management via Terraform, you can deploy the infrastructure as follows:

Initialize Terraform:
```
terraform init
```
Plan the infrastructure changes:
```
terraform plan
```
Apply the infrastructure changes:
```
terraform apply
```
Confirm the changes by typing yes when prompted.

4. Install Poetry

Poetry is used for dependency management. You can install it by running the following command:

curl -sSL https://install.python-poetry.org | python3 -

After installation, ensure that Poetry is added to your system's PATH by following the instructions provided after installation. To verify that Poetry is installed correctly, run:

poetry --version

5. Install Python Dependencies

With Poetry installed, you can now install all project dependencies:

poetry install

This command will set up a virtual environment and install all required packages as specified in pyproject.toml.

5. Activate the Poetry Shell

Activate the Poetry-managed virtual environment:

poetry shell

6. Activate the Poetry Shell

Activate the Poetry-managed virtual environment:

poetry shell

6. Go the the Application Directory

Change Directory to the folder containing Entrypoint

cd src

8. Run the Streamlit App

To launch the application, execute the following command:

streamlit run GAIA.py

This will start the Streamlit app and open it in your browser.

Conclusion

Your application should now be running, and the required infrastructure should be deployed. For additional configuration or updates, refer to the project-specific documentation.

ATTESTATION

WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK Contribution:

a. Sai Surya Madhav Rebbapragada: 35% - Worked on Streamlit UI, OpenAI prototying, Terraform for infra setup - 35hrs

b. Uday Kiran Dasari: 30% - Worked on Metrics UI, database setup and populating data, OpenAI & Streamlit prototying - 30hrs

c. Akash Varun: 35% - OpenAI prototying and all the required tools, object store setup and popuating with data - 35hrs

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.devcontainer		.devcontainer
.idea		.idea
assets		assets
prototyping		prototyping
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sample_secrets.toml		sample_secrets.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GAIA Benchmark Validation Application

Demo URL Link

Deployed Streamlit app link

Code Labs Link

Abstract

Checklist of Deliverables

Architecture

Tech Stack

Dataset

Data Storage

Features

Landing Page:

LLM Prompting Page:

Metrics Dashboard:

Project Installation Guide

Prerequisites

1. Clone the Repository

2. Configure Secrets

3. Terraform Infrastructure Setup

4. Install Poetry

5. Install Python Dependencies

5. Activate the Poetry Shell

6. Activate the Poetry Shell

6. Go the the Application Directory

8. Run the Streamlit App

Conclusion

ATTESTATION

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

DAMG7245-Big-Data-Sys-SEC-02-Fall24/Assignment-1-GAIA

Folders and files

Latest commit

History

Repository files navigation

GAIA Benchmark Validation Application

Demo URL Link

Deployed Streamlit app link

Code Labs Link

Abstract

Checklist of Deliverables

Architecture

Tech Stack

Dataset

Data Storage

Features

Landing Page:

LLM Prompting Page:

Metrics Dashboard:

Project Installation Guide

Prerequisites

1. Clone the Repository

2. Configure Secrets

3. Terraform Infrastructure Setup

4. Install Poetry

5. Install Python Dependencies

5. Activate the Poetry Shell

6. Activate the Poetry Shell

6. Go the the Application Directory

8. Run the Streamlit App

Conclusion

ATTESTATION

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages