Skip to content

Ritvik-G/in-situ_eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

In-Situ Evaluator 🚀

License: MIT Python 3.8+

Accelerating domain-specific LLM evaluation through strategic subsampling and real-time analysis


📖 Overview

Traditional LLM leaderboards often fail to predict performance in specialized domains, while conventional adaptation methods like fine-tuning demand excessive computational resources. To solve this issue, we present you a proof of concept for running Real-time evaluations on your dataset with configurable interface for model, hyperparameters and RAG technique selection - In-Situ Evaluator. Specifically, we employ:

  • Dataset subsampling for rapid domain-specific benchmarking.
  • API Interface for choosing between various LLM providers and models.
  • Custom RAG pipelines for 3 most popular RAG architectures.

By following the examples in this repository, you can:

  • Load your custom dataset
  • Choose LLM provider and Model for evals (e.g. Groq (Llama-2, Mixtral), OpenAI(GPT-3.5, GPT-4))
  • Customize model based hyperparameters to your tailored liking (e.g. Temperature, Top P)
  • Choose between RAG techniques (e.g. Vanilla RAG, Graph RAG, RAPTOR)
  • Configure hyperparameters for RAG (e.g. Chunk size, Chunk Overlap)
  • Run Real-time evaluation of LLMs, hyperparameters, and RAG configurations
  • Compare with metrics (BLEU, ROUGE, RoBERTa-NLI, etc.)

Paper preprint coming soon. This repository contains a production-ready proof-of-concept.


✨ Features

Component Supported Options
LLM Providers Groq , OpenAI
RAG Techniques Vanilla RAG, Graph RAG, RAPTOR
Model Hyperparameters Temperature, Top_P, Stop Sequence, Stream RAG chunk size/overlap
RAG Hyperparameters Chunk Size, Chunk Overlap, Top K
Proxy Datasets SQuAD (easy), TriviaQA (medium), WikiQA (hard)
Metrics BLEU, ROUGE-L, METEOR, RoBERTa-NLI, Cosine Similarity


⚡ Running the Codebase

Prerequisites

  • Python 3.8+ (for backend)
  • ReactJS (for frontend)
  • GROQ/OpenAI API keys (for LLM calls). You can obtain these API keys by following the steps of the respective providers -
  • (Optional) Before uploading the custom dataset, please ensure it is of json file type and is of the following format -
       [
          {
             "Question": "This is a sample",
             "Context": "This is the context related to the question.",
             "Response": "This is the ground truth answer"
          },
          {
             "Question": "What is the hottest planet in our solar system?",
             "Context": "The planets in our solar system vary in temperature due to their distance from the Sun, atmospheric composition, and other factors.",
             "Response": "Venus is the hottest planet in our solar system, with surface temperatures reaching up to 462°C (864°F), due to its thick atmosphere and runaway greenhouse effect."
          }
       ]

Initial Installation

Clone the git repository -

git clone https://github.com/Ritvik-G/in-situ_eval.git
cd in-situ_eval

Frontend Setup

To set up the frontend, follow these steps:

  1. Navigate to the Frontend Directory
    First, change the directory to the frontend folder:
    cd frontend
  2. Install Dependencies
    Use npm to install all the required dependencies:
    npm install
  3. Start the Frontend
    Finally, start the frontend server:
    npm start
    

Backend Setup

The structure of backend is as follows -

   Backend/
    ├── data/ # proxy datasets
    │   ├── squad.json
    │   ├── trivia_qa.json
    │   └── wiki_qa.json
    ├── RAG/
    │   ├── rag.py
    │   ├── raptor.py
    │   ├── graphrag.py
    │   └── model_config.py  # LLM caller function
    ├── Benchmarks/ # Benchmarker that calls data
    │   └── benchmarks.py
    ├── Evaluations/
    │   ├── evaluations.py
    │   └── consolidate_metrics.py
    ├── app.py
    └── requirements.txt
  1. Navigate to the Backend Directory
    First, change the directory to the backend folder:
    cd backend
  2. Install Dependencies
    Use pip to install all the required dependencies:
    pip install -r requirements.txt
  3. Run the Backend
    Run the backend server. By default, it would be running on http://localhost:5000/:
    python app.py

Combined Setup

Once both the frontend and backend servers are running, you can access the application via the frontend URL http://localhost:3000/api and interact with the application.

About

In-Situ Evaluator: Real-Time Subsample Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published