Skip to content

Commit e0b901f

Browse files
authored
Create README.md
1 parent 2788892 commit e0b901f

File tree

1 file changed

+90
-0
lines changed

1 file changed

+90
-0
lines changed

README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
2+
# LlmAsJudgeEvals
3+
4+
This library provides a service for evaluating responses from Large Language Models (LLMs) using the LLM itself as a judge. It leverages Semantic Kernel to define and execute evaluation functions based on prompt templates.
5+
6+
**For a more precise evaluation score, the library utilizes `logprobs` and calculates a weighted total of probabilities for each evaluation criterion.**
7+
8+
## Installation
9+
10+
Install the package via NuGet:
11+
12+
```
13+
Install-Package HillPhelmuth.SemanticKernel.LlmAsJudgeEvals
14+
```
15+
16+
## Usage
17+
18+
### Built-in Evaluation Functions
19+
20+
The package includes a set of built-in evaluation functions, each focusing on a specific aspect of LLM output quality:
21+
22+
* **Coherence:** Evaluates the logical flow and consistency of the response.
23+
* **Empathy:** Assesses the level of empathy and understanding conveyed in the response.
24+
* **Fluency:** Measures the smoothness and naturalness of the language used.
25+
* **GptGroundedness:** Determines how well the response is grounded in factual information.
26+
* **GptGroundedness2:** An alternative approach to evaluating groundedness.
27+
* **GptSimilarity:** Compares the response to a reference text or objectively correct answer for similarity.
28+
* **Helpfulness:** Assesses the degree to which the response is helpful and informative.
29+
* **PerceivedIntelligence:** Evaluates the perceived intelligence and knowledge reflected in the response.
30+
* **PerceivedIntelligenceNonRag:** A variant of PerceivedIntelligence tailored for non-Retrieval Augmented Generation (RAG) models.
31+
* **Relevance:** Measures the relevance of the response to the given prompt or question and a reference text for RAG.
32+
33+
34+
```csharp
35+
36+
// Initialize the Semantic Kernel
37+
var kernel = Kernel.CreateBuilder().AddOpenAIChatCompletion("openai-model-name", "openai-apiKey").Build();
38+
39+
// Create an instance of the EvalService
40+
var evalService = new EvalService(kernel);
41+
42+
// Create an input model for the built-in evaluation function
43+
var coherenceInput = InputModel.CoherenceModel("This is the answer to evaluate.", "This is the question or prompt that generated the answer");
44+
45+
// Execute the evaluation
46+
var result = await evalService.ExecuteEval(inputModel);
47+
48+
49+
Console.WriteLine($"Evaluation score: {result.Score}");
50+
51+
```
52+
53+
### Custom Evaluation Functions
54+
55+
```csharp
56+
57+
// Initialize the Semantic Kernel
58+
var kernel = Kernel.CreateBuilder().AddOpenAIChatCompletion("openai-model-name", "openai-apiKey").Build();
59+
60+
// Create an instance of the EvalService
61+
var evalService = new EvalService(kernel);
62+
63+
// Add an evaluation function (optional)
64+
evalService.AddEvalFunction("MyEvalFunction", "This is the prompt for my evaluation function.", new PromptExecutionSettings());
65+
66+
// Create an input model for the evaluation function
67+
var inputModel = new InputModel
68+
{
69+
FunctionName = "MyEvalFunction", // Replace with the name of your evaluation function
70+
RequiredInputs = new Dictionary<string, string>
71+
{
72+
{ "input", "This is the text to evaluate." }
73+
}
74+
};
75+
76+
// Execute the evaluation
77+
var result = await evalService.ExecuteEval(inputModel);
78+
79+
80+
Console.WriteLine($"Evaluation score: {result.Score}");
81+
```
82+
83+
## Features
84+
85+
* **Define evaluation functions using prompt templates:** You can define evaluation functions using prompt templates written in YAML.
86+
* **Execute evaluations:** The `EvalService` provides methods for executing evaluations on input data.
87+
* **Aggregate results:** The `EvalService` can aggregate evaluation scores across multiple inputs.
88+
* **Built-in evaluation functions:** The package includes a set of pre-defined evaluation functions based on common evaluation metrics.
89+
* **Logprobs-based scoring:** Leverages `logprobs` for a more granular and precise evaluation score.
90+

0 commit comments

Comments
 (0)