models qna rag metrics eval

qna-rag-metrics-eval

Overview

The Q&A RAG (Retrieval Augmented Generation) evaluation flow will evaluate the Q&A RAG systems by leveraging the state-of-the-art Large Language Models (LLM) to measure the quality and safety of your responses . Utilizing GPT model to assist with measurements aims to achieve a high agreement with human evaluations compared to traditional mathematical measurements.

Inference samples

Inference type	CLI	VS Code Extension
Real time	deploy-promptflow-model-cli-example	deploy-promptflow-model-vscode-extension-example
Batch	N/A	N/A

Sample inputs and outputs (for real-time inference)

Sample input

{
    "inputs": {
        "question": "What is the purpose of the LLM Grounding Score, and what does a higher score mean in this context?",
        "answer": "The LLM Grounding Score gauges an LLM's grasp of provided context in in-context learning. A higher score implies better understanding and more accurate responses.",
        "metrics": "gpt_groundedness,gpt_retrieval_score,gpt_relevance",
        "documents": "{'documents': [{'[doc1]': {'title': 'In-Context Learning with Large-Scale Pretrained Language Models',\r'content': 'In-Context Learning uses large pretrained models to acquire new skills. GPT-3 introduced this, achieving accuracy similar to fine-tuned models. Prompt order and similar training examples affect performance. Retrievers locate exemplary few-shot examples, with semantic similarity fine-tuning. Advanced retriever use includes code generation, but 'fantastic' examples assumption has task-specific limitations.'}}]}"
    }
}

Sample output

{
    "outputs": {
        "gpt_groundedness":5,
        "gpt_relevance":5,
        "gpt_retrieval_score":1
    }
}

Version: 7

View in Studio: https://ml.azure.com/registries/azureml/models/qna-rag-metrics-eval/version/7

Properties

is-promptflow: True

azureml.promptflow.section: gallery

azureml.promptflow.type: evaluate

azureml.promptflow.name: QnA RAG Evaluation

azureml.promptflow.description: Compute the quality of the answer for the given question based on the retrieved documents

inference-min-sku-spec: 2|0|14|28

inference-recommended-sku: Standard_DS3_v2

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

models qna rag metrics eval

qna-rag-metrics-eval

Overview

Inference samples

Sample inputs and outputs (for real-time inference)

Sample input

Sample output

Properties

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!