https://drive.google.com/drive/folders/1wgYeUY-HsDuWcqGq1hSNVRQ3gvQBMLZC?usp=sharing
https://a1b1gaia.streamlit.app
https://codelabs-preview.appspot.com/?file_id=1rkwEQLnM5Z62LN___AOHtexZJ9DzQ0kDlELlBA2dzlY#1
This project builds a validation tool for the GAIA dataset using a multi-page Streamlit interface. It allows users to randomly select and evaluate test cases, query different LLM models via the OpenAI API, and compare model responses with expected answers. Users can annotate incorrect responses and re-evaluate the model’s performance. A metrics dashboard displays the overall performance, and all interactions are tracked in a database for reporting and analytics.
- Diagrams illustrating system architecture
- Fully documented CodeLab
- 5-minute video submission
- Link to a working Streamlit application
- GitHub repository with project files and documentation
The architecture consists of the following components:
-
Data Sources:
- Hugging Face: Provides the GAIA dataset for validation and testing.
- GCP (Google Cloud Platform): Handles storage via GCS (Google Cloud Storage) buckets and stores metadata in a PostgreSQL database.
-
Infrastructure:
- Terraform: Automates the setup of GCP resources such as databases and buckets.
-
Streamlit Application:
- Multi-page UI with the following pages:
- Landing Page: Documentation and introduction to the application.
- LLM Prompting Page: Allows users to select a random task, submit a query to OpenAI models, and interact with responses.
- Metrics Dashboard: Provides visualizations and reports on model performance.
- Multi-page UI with the following pages:
-
LLM Interaction:
- OpenAI API: Queries are sent to OpenAI models for LLM responses, which are stored for comparison and further processing.
-
Database:
- PostgreSQL: Stores task metadata, LLM responses, annotations, and interaction logs for analysis.
-
Storage:
- GCS (Google Cloud Storage): Manages file uploads, including task-related files fetched during LLM processing.
- Streamlit: For the multi-page UI and interaction.
- Hugging Face: Source for GAIA dataset.
- OpenAI API: Used for querying different language models.
- Google Cloud Platform (GCS, PostgreSQL): Storage and database.
- Terraform: Infrastructure management.
- Python: For backend logic and integration with various APIs.
- GAIA Dataset: Available via Hugging Face, the GAIA dataset contains structured metadata and test cases. It forms the core of the benchmarking process.
- Files and task-related data are stored in Google Cloud Storage (GCS) buckets.
- PostgreSQL on GCP is used for storing metadata, LLM responses, and user annotations.
- Basic documentation and overview of the application.
- Pick a random task from the GAIA dataset.
- Select from different LLM models (default to one OpenAI model).
- Fetch associated files from GCP and provide context for the query.
- Submit the question along with the context to OpenAI and view the LLM response.
- Annotate responses and re-submit the task for evaluation.
- Save responses to the database, either as-is or after annotation.
- Visualize performance metrics, including model accuracy, response quality, and interaction history.
Here's an updated README section with instructions for installing Poetry before continuing with the rest of the setup steps:
Ensure that you have the following installed on your system:
- Python
- Terraform
- GIT
- GCP Account
Now, clone the repository to your local machine:
git clone https://github.com/DAMG7245-Big-Data-Sys-SEC-02-Fall24/Assignment-1-GAIA.git
cd Assignment-1-GAIA
To configure sensitive information like GCP credentials, OpenAI API key, and database settings:
Replace the placeholders (xxxxxxx
) with your actual credentials in secrets.toml
If your project involves infrastructure management via Terraform, you can deploy the infrastructure as follows:
-
Initialize Terraform:
terraform init
-
Plan the infrastructure changes:
terraform plan
-
Apply the infrastructure changes:
terraform apply
Confirm the changes by typing
yes
when prompted.
Poetry is used for dependency management. You can install it by running the following command:
curl -sSL https://install.python-poetry.org | python3 -
After installation, ensure that Poetry is added to your system's PATH by following the instructions provided after installation. To verify that Poetry is installed correctly, run:
poetry --version
With Poetry installed, you can now install all project dependencies:
poetry install
This command will set up a virtual environment and install all required packages as specified in pyproject.toml
.
Activate the Poetry-managed virtual environment:
poetry shell
Activate the Poetry-managed virtual environment:
poetry shell
Change Directory to the folder containing Entrypoint
cd src
To launch the application, execute the following command:
streamlit run GAIA.py
This will start the Streamlit app and open it in your browser.
Your application should now be running, and the required infrastructure should be deployed. For additional configuration or updates, refer to the project-specific documentation.
WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK Contribution:
a. Sai Surya Madhav Rebbapragada: 35% - Worked on Streamlit UI, OpenAI prototying, Terraform for infra setup - 35hrs
b. Uday Kiran Dasari: 30% - Worked on Metrics UI, database setup and populating data, OpenAI & Streamlit prototying - 30hrs
c. Akash Varun: 35% - OpenAI prototying and all the required tools, object store setup and popuating with data - 35hrs