Skip to content

DanielPuentee/Automatic-RAG-Dataset-Creation-And-Evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic RAG Dataset Creation and Evaluation with Giskard & RAGAS

Alt text

A lightweight RAG pipeline to create an evaluation dataset automatically and evaluate it, all using Langchain, RAGAS, Giskard, Gemini, and LangSmith.

python GitHub top language Visual Studio Code Python Anaconda GitHub last commit Code style: black GitHub Copilot

Author: Daniel Puente Viejo

🎯 Objective

This repository demonstrates how to quickly evaluate RAG systems without the need to manually create a large dataset.
We use a sample use case: answering questions about popular TV series like Breaking Bad and La Casa de Papel.
The pipeline is fully open-source and built using:

  • Langchain - a platform to help build language model applications
  • Gemmini – a completely open-source orchestration framework for LLM apps
  • RAGAS – for evaluating RAG responses
  • Giskard – to detect hallucinations, bias, and robustness issues
  • LangSmith – to monitor, debug, and evaluate LLM usage at scale

🧠 Use Case

We simulate a real-world scenario:

A user asks detailed questions about a TV show, such as character arcs, plot developments, or ethical decisions.
The system retrieves summaries of episodes and returns a relevant, accurate response.


🛠️ Tools Used

Tool Role
Langchain Build the RAG pipeline (retriever + LLM)
Gemmini Open-source LLM orchestration & agent management
RAGAS Automatically evaluate generated answers
Giskard Test model outputs for hallucinations, bias, robustness
LangSmith Monitor and log RAG chains and metrics at runtime

🧪 Evaluation Strategy

We eliminate the need to create a labeled dataset from scratch by:

  1. Generating realistic questions and answers using Giskard
  2. Using RAGAS to compute evaluation metrics:
    • Context Precision
    • Context Recall
    • Faithfulness
    • Answer Similarity
    • Answer Relevancy
  3. Tracking all generations and context chunks using LangSmith

📂 Notebook Structure

1. 🔧 Setup
   - Install and import dependencies
   - Env variables
   - Load clients

2. 📦 Chunking & Vect BBDD creation
   - Simple steps to create chunking dataset and vector database

3. ⚙️ Create dataset
   - Use Giskard to create the dataset

4. 🔄 Retrieve examples & Evaluate
   - Use RAGAS to compute metrics

5. 🎯 Answer questions & Evaluate
   - Use RAGAS to compute metrics

About

This repository demonstrates how to quickly evaluate RAG systems without the need to manually create a large dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published