chore(llmobs): implement non skeleton code for ragas faithfulness #10795

lievan · 2024-09-24T20:21:53Z

This PR adds in the non-boiler plate code for the ragas faithfulness evaluator.

The majority of LOC changes are from cassettes/requirements. The main logic is in ddtrace/llmobs/_evaluators/ragas/faithfulness.py.

There are four important features of this PR:

1 .Extracting the inputs to a ragas faithfulness eval from a span

A span event must contain data necessary for ragas evaluations - question, context, and answer.

The evaluator tries to extract this data by looking at the span event using the following logic:

question = input.prompt.variables.question OR input.messages[-1].content
context = input.prompt.variables.context
answer = output.messages[-1].content

Relevant tests...
test_ragas_faithfulness_submits_evaluation...
test_ragas_faithfulness_returns_none_if_inputs_extraction_fails

2. Ragas faithfulness implementation

See the evaluate function for the underlying Ragas faithfulness implementation.

It roughly follows the original source implementation in the ragas framework.

Relevant tests...
test_ragas_faithfulness_submits_evaluation...

3. Tracing RAGAS

Tracing RAGAS is a requirement for faithfulness a user's ml app will be polluted by a bunch of auto-instrumented langchain spans.

The ml_app of ragas traces should be dd-ragas-{original ml app name}.
All ragas traces are marked with an runner.integration:ragas tag. This tells us that these traces are evaluations traces from the ragas integration. We can tell it's a ragas span by looking at the ml app of the span at trace processing time. We also use this to safegaurd against infinite eval loops (enqueuing an llm span generated from an evaluation to the evaluator runner).

Relevant tests...
test_ragas_faithfulness_emits_traces
test_llmobs_with_evaluator_runner_does_not_enqueue_evaluation_spans

4. RAGAS Evaluator Setup

Ragas dependencies (ragas, langchain) are only required if the ragas faithfulness evaluator is configured.
The ragas evaluator should also always use the most up to date faithfulness instance from the ragas library itself to allow a user to customize the llm's & prompts for faithfulness.
If an llm is not set by the user, we use the default llm given to us by ragas's llm_factory method

Relevant tests...
test_ragas_faithfulness_disabled_if_dependencies_not_present
test_ragas_evaluator_init
test_ragas_faithfulness_has_modified_faithfulness_instance

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

…gas-skeleton

…py into evan.li/ragas-skeleton

…gas-skeleton

…py into evan.li/ragas-skeleton

tests/llmobs/test_llmobs_ragas_faithfulness_evaluator.py

tests/llmobs/test_llmobs_service.py

tests/llmobs/test_llmobs_ragas_faithfulness_evaluator.py

tests/llmobs/test_llmobs_evaluator_runner.py

tests/llmobs/test_llmobs_service.py

tests/llmobs/_utils.py

github-actions · 2024-09-24T20:22:32Z

CODEOWNERS have been resolved as:

.riot/requirements/12c5529.txt                                          @DataDog/apm-python
.riot/requirements/146f2d8.txt                                          @DataDog/apm-python
.riot/requirements/1687eab.txt                                          @DataDog/apm-python
.riot/requirements/4102ef5.txt                                          @DataDog/apm-python
.riot/requirements/771848b.txt                                          @DataDog/apm-python
ddtrace/llmobs/_evaluators/ragas/models.py                              @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_evaluator_runner.test_evaluator_runner_periodic_enqueues_eval_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_evaluator_runner.test_evaluator_runner_timed_enqueues_eval_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.emits_traces_and_evaluations_on_exit.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.test_ragas_faithfulness_emits_traces.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.test_ragas_faithfulness_submits_evaluation.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.test_ragas_faithfulness_submits_evaluation_on_span_with_question_in_messages.yaml  @DataDog/ml-observability
tests/llmobs/test_llmobs_ragas_faithfulness_evaluator.py                @DataDog/ml-observability
ddtrace/llmobs/_constants.py                                            @DataDog/ml-observability
ddtrace/llmobs/_evaluators/ragas/faithfulness.py                        @DataDog/ml-observability
ddtrace/llmobs/_evaluators/runner.py                                    @DataDog/ml-observability
ddtrace/llmobs/_evaluators/sampler.py                                   @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_trace_processor.py                                      @DataDog/ml-observability
riotfile.py                                                             @DataDog/apm-python
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/conftest.py                                                @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_evaluator_runner.send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/test_llmobs_evaluator_runner.py                            @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_llmobs_trace_processor.py                             @DataDog/ml-observability

tests/llmobs/test_llmobs_ragas_faithfulness_evaluator.py

tests/llmobs/test_llmobs_service.py

tests/llmobs/test_llmobs_ragas_faithfulness_evaluator.py

tests/llmobs/test_llmobs_evaluator_runner.py

tests/llmobs/test_llmobs_service.py

tests/llmobs/_utils.py

pr-commenter · 2024-09-24T20:56:22Z

Benchmarks

Benchmark execution time: 2024-10-30 15:46:17

Comparing candidate commit 23753ec in PR branch evan.li/ragas-faithfulness with baseline commit f3b5275 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 328 metrics, 2 unstable metrics.

Yun-Kim

I wasn't able to fully review the entire PR but some initial thoughts:

Since we're actively making calls to OpenAI and using third party libraries on behalf of customers here we should make sure we have really great documentation (in form of both docstrings and public corp docs).
Do we require OpenAI API keys?
Not sure if it's because I'm running on fumes but the evaluation code is surprisingly difficult to follow 😢 Would appreciate either more context about RAGAS evals (in the PR description) or more readability or more docstrings. Thanks!

ddtrace/llmobs/_evaluators/ragas/faithfulness.py

…11121) This PR fixes behavior where the last span generated right before process exit was not being evaluated by the ragas faithfulness evaluator. Previous behavior: 1. On process exit, `LLMObs.disable()` is called 2. `LLMObs.disabled` is set to `True` 3. Eval metric writers and span writers are stopped 4. Evaluator runner is stopped There is a problem here where span & eval metric writers are stopped & LLM Obs disabled is set to true while the eval runner has not finished tracing & evaluating the span events in it's buffer. ### Fix attempt 1 1. Make sure eval runner is stopped **_BEFORE_** eval metric writer and span writers are stopped 2. Make sure `LLMOBs.disabled` is set to false only after the eval runner is stopped 3. Within `_stop_service` call `self.periodic` and `self.executor.shutdown(wait=True)` The assumption here is that periodic will schedule all the evaluation job threads, and calling shutdown with `wait=True` will block until the threads are finished This didn’t work though; after `self.executor.map` was called to create the job threads, the app exited without the `run_and_submit_evaluation` faithfulness function even being called. Could there be some issue with scheduling these threads while the process is exiting? ### Fix attempt 2 (this PR) Same steps as 1) and 2) For step 3, we add a `_wait_syncronously` argument to `periodic`. Setting `_wait_synchrousnly=True` turns periodic into a blocking function that synchronously runs each evaluation metric over every span. This ensures the last span events in the evaluator run buffer are evaluated, metrics are enqueued to the metric writer & the eval metric writer & span writers are flushed before we disable llmobs. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]>

datadog-dd-trace-py-rkomorn · 2024-10-22T23:08:13Z

Datadog Report

Branch report: evan.li/ragas-faithfulness
Commit report: 23753ec
Test service: dd-trace-py

✅ 0 Failed, 1286 Passed, 0 Skipped, 33m 34.08s Total duration (6m 45.38s time saved)

…ace-py into evan.li/ragas-faithfulness

ddtrace/llmobs/_evaluators/ragas/faithfulness.py

Co-authored-by: Sam Brenner <[email protected]>

Add the following telemetry for ragas faithfulness: I leaned toward using simple count metrics since i'm not sure if `ragas_faithfulness` fits under the concept of a tracing integration ## Metrics - Number of init attempts `dd.instrumentation_telemetry_data.llmobs.evaluators.init`, tagged with `state=success/failure`, `evaluator_label` - Number of run attempts`dd.instrumentation_telemetry_data.llmobs.evaluators.run`, tagged with `state=success/failure reason`, `evaluator_label` - Number of sampling rule parsing failures `dd.instrumentation_telemetry_data.llmobs.evaluators.errors`, tagged with `reason` ## Configurations - Configuration telemetry for `_DD_LLMOBS_EVALUATOR_SAMPLING_RULES` ## Logging - Logging ragas dependencies failures - Logging sampling rule parsing failures Testing... ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [ ] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]>

ddtrace/llmobs/_evaluators/ragas/faithfulness.py

ddtrace/llmobs/_evaluators/runner.py

ddtrace/llmobs/_evaluators/ragas/faithfulness.py

ddtrace/llmobs/_trace_processor.py

Kyle-Verhoog

🚢 let's get it in some customer hands!

ddtrace/llmobs/_evaluators/ragas/faithfulness.py

tests/llmobs/conftest.py

Co-authored-by: kyle <[email protected]>

…0795) This PR adds in the non-boiler plate code for the ragas faithfulness evaluator. The majority of LOC changes are from cassettes/requirements. The main logic is in `ddtrace/llmobs/_evaluators/ragas/faithfulness.py`. There are four important features of this PR: ### 1 .Extracting the inputs to a ragas faithfulness eval from a span A span event must contain data necessary for ragas evaluations - question, context, and answer. The evaluator tries to extract this data by looking at the span event using the following logic: ``` question = input.prompt.variables.question OR input.messages[-1].content context = input.prompt.variables.context answer = output.messages[-1].content ``` **Relevant tests...** `test_ragas_faithfulness_submits_evaluation...` `test_ragas_faithfulness_returns_none_if_inputs_extraction_fails` ### 2. Ragas faithfulness implementation See the [evaluate](https://github.com/DataDog/dd-trace-py/blob/a26b4ab514db35d6135bef510be13dc3d8dc4d52/ddtrace/llmobs/_evaluators/ragas/faithfulness.py#L110) function for the underlying Ragas faithfulness implementation. It roughly follows the [original](https://github.com/explodinggradients/ragas/blob/89115bfc9250d5335d8f8ab328f065021740d931/src/ragas/metrics/_faithfulness.py#L278C1-L279C1) source implementation in the ragas framework. **Relevant tests...** `test_ragas_faithfulness_submits_evaluation...` ### 3. Tracing RAGAS Tracing RAGAS is a requirement for faithfulness a user's ml app will be polluted by a bunch of auto-instrumented langchain spans. - The `ml_app` of ragas traces should be `dd-ragas-{original ml app name}`. - All ragas traces are marked with an `runner.integration:ragas` tag. This tells us that these traces are evaluations traces from the ragas integration. We can tell it's a ragas span by looking at the `ml app` of the span at trace processing time. We also use this to safegaurd against infinite eval loops (enqueuing an llm span generated from an evaluation to the evaluator runner). **Relevant tests...** `test_ragas_faithfulness_emits_traces` `test_llmobs_with_evaluator_runner_does_not_enqueue_evaluation_spans` ### 4. RAGAS Evaluator Setup - Ragas dependencies (ragas, langchain) are only required if the ragas faithfulness evaluator is configured. - The ragas evaluator should also always use the most up to date faithfulness instance from the ragas library itself to allow a user to customize the llm's & prompts for faithfulness. - If an llm is not set by the user, we use the default llm given to us by ragas's `llm_factory` method **Relevant tests...** `test_ragas_faithfulness_disabled_if_dependencies_not_present` `test_ragas_evaluator_init` `test_ragas_faithfulness_has_modified_faithfulness_instance` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]> Co-authored-by: Sam Brenner <[email protected]> Co-authored-by: kyle <[email protected]> (cherry picked from commit 2378740)

…0795) This PR adds in the non-boiler plate code for the ragas faithfulness evaluator. The majority of LOC changes are from cassettes/requirements. The main logic is in `ddtrace/llmobs/_evaluators/ragas/faithfulness.py`. There are four important features of this PR: ### 1 .Extracting the inputs to a ragas faithfulness eval from a span A span event must contain data necessary for ragas evaluations - question, context, and answer. The evaluator tries to extract this data by looking at the span event using the following logic: ``` question = input.prompt.variables.question OR input.messages[-1].content context = input.prompt.variables.context answer = output.messages[-1].content ``` **Relevant tests...** `test_ragas_faithfulness_submits_evaluation...` `test_ragas_faithfulness_returns_none_if_inputs_extraction_fails` ### 2. Ragas faithfulness implementation See the [evaluate](https://github.com/DataDog/dd-trace-py/blob/a26b4ab514db35d6135bef510be13dc3d8dc4d52/ddtrace/llmobs/_evaluators/ragas/faithfulness.py#L110) function for the underlying Ragas faithfulness implementation. It roughly follows the [original](https://github.com/explodinggradients/ragas/blob/89115bfc9250d5335d8f8ab328f065021740d931/src/ragas/metrics/_faithfulness.py#L278C1-L279C1) source implementation in the ragas framework. **Relevant tests...** `test_ragas_faithfulness_submits_evaluation...` ### 3. Tracing RAGAS Tracing RAGAS is a requirement for faithfulness a user's ml app will be polluted by a bunch of auto-instrumented langchain spans. - The `ml_app` of ragas traces should be `dd-ragas-{original ml app name}`. - All ragas traces are marked with an `runner.integration:ragas` tag. This tells us that these traces are evaluations traces from the ragas integration. We can tell it's a ragas span by looking at the `ml app` of the span at trace processing time. We also use this to safegaurd against infinite eval loops (enqueuing an llm span generated from an evaluation to the evaluator runner). **Relevant tests...** `test_ragas_faithfulness_emits_traces` `test_llmobs_with_evaluator_runner_does_not_enqueue_evaluation_spans` ### 4. RAGAS Evaluator Setup - Ragas dependencies (ragas, langchain) are only required if the ragas faithfulness evaluator is configured. - The ragas evaluator should also always use the most up to date faithfulness instance from the ragas library itself to allow a user to customize the llm's & prompts for faithfulness. - If an llm is not set by the user, we use the default llm given to us by ragas's `llm_factory` method **Relevant tests...** `test_ragas_faithfulness_disabled_if_dependencies_not_present` `test_ragas_evaluator_init` `test_ragas_faithfulness_has_modified_faithfulness_instance` ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: lievan <[email protected]> Co-authored-by: Sam Brenner <[email protected]> Co-authored-by: kyle <[email protected]>

lievan and others added 22 commits September 13, 2024 14:51

implement ragas faithfulenss runner with dummy ragas score generator

571d317

remove newline

4b3d840

pydantic v1

7b9c929

refactor into evaluator list

2e883a0

add unit tests

7b31443

fix expectde span event

13229bd

merg conf

b493e20

remove config option, use only env var

d290dcd

address comments

6e49cca

refactor into one evaluator service

fcf9991

dont cancel futures

10b276f

refactor dummy faithfulness into class

b6fa4e0

rename field to label

be893a1

Merge branch 'main' into evan.li/ragas-skeleton

a309330

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/ra…

d849067

…gas-skeleton

Merge branch 'evan.li/ragas-skeleton' of github.com:DataDog/dd-trace-…

fd73621

…py into evan.li/ragas-skeleton

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/ra…

2f48461

…gas-skeleton

Merge branch 'main' into evan.li/ragas-skeleton

04d202e

refactor so we store the service only in ragas

a991f15

Merge branch 'evan.li/ragas-skeleton' of github.com:DataDog/dd-trace-…

38e0a23

…py into evan.li/ragas-skeleton

rename a test

ea5d4fa

ragas tests

9ac096d

lievan mentioned this pull request Sep 24, 2024

chore(llmobs): add in ragas faithfulness evaluator #10759

Closed

2 tasks

datadog-datadog-prod-us1 bot reviewed Sep 24, 2024

View reviewed changes

lievan added the changelog/no-changelog A changelog entry is not required for this PR. label Sep 24, 2024

riotfile update, rm newliens

ca5c329

Yun-Kim reviewed Sep 25, 2024

View reviewed changes

lievan and others added 3 commits October 22, 2024 21:02

refactor typing, rm newlines

afee960

Merge branch 'evan.li/ragas-faithfulness' of github.com:DataDog/dd-tr…

458c372

…ace-py into evan.li/ragas-faithfulness

Merge branch 'main' into evan.li/ragas-faithfulness

b387826

sabrenner reviewed Oct 23, 2024

View reviewed changes

ddtrace/llmobs/_evaluators/ragas/faithfulness.py Outdated Show resolved Hide resolved

lievan and others added 2 commits October 23, 2024 13:25

Update ddtrace/llmobs/_evaluators/ragas/faithfulness.py

8103102

Co-authored-by: Sam Brenner <[email protected]>

Kyle-Verhoog reviewed Oct 24, 2024

View reviewed changes

lievan added 4 commits October 24, 2024 17:28

throw if deps aren't present; span naming updates

ee77ea5

res conf

086171a

improve type annotations

40e6ee0

enhance workflow annotations

36d7bca

lievan requested a review from a team as a code owner October 25, 2024 15:16

Kyle-Verhoog approved these changes Oct 28, 2024

View reviewed changes

ddtrace/llmobs/_evaluators/ragas/faithfulness.py Show resolved Hide resolved

tests/llmobs/conftest.py Outdated Show resolved Hide resolved

lievan and others added 2 commits October 27, 2024 23:45

Update tests/llmobs/conftest.py

0bdda12

Co-authored-by: kyle <[email protected]>

Merge branch 'main' into evan.li/ragas-faithfulness

d855f9c

lievan enabled auto-merge (squash) October 28, 2024 03:49

lievan added 5 commits October 28, 2024 09:58

Merge branch 'main' into evan.li/ragas-faithfulness

aee78e9

Merge branch 'main' into evan.li/ragas-faithfulness

1b487fb

Merge branch 'main' into evan.li/ragas-faithfulness

87e6b69

Merge branch 'main' into evan.li/ragas-faithfulness

f6529c1

Merge branch 'main' into evan.li/ragas-faithfulness

23753ec

lievan merged commit 2378740 into main Oct 30, 2024
534 checks passed

lievan deleted the evan.li/ragas-faithfulness branch October 30, 2024 16:21

lievan added the [DEPRECATED] backport 2.16 label Nov 5, 2024

github-actions bot mentioned this pull request Nov 5, 2024

chore(llmobs): implement non skeleton code for ragas faithfulness [backport 2.16] #11298

Closed

2 tasks

lievan removed the [DEPRECATED] backport 2.16 label Nov 5, 2024

chore(llmobs): implement non skeleton code for ragas faithfulness #10795

chore(llmobs): implement non skeleton code for ragas faithfulness #10795

Uh oh!

Conversation

lievan commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 .Extracting the inputs to a ragas faithfulness eval from a span

2. Ragas faithfulness implementation

3. Tracing RAGAS

4. RAGAS Evaluator Setup

Checklist

Reviewer Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pr-commenter bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

Yun-Kim left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

datadog-dd-trace-py-rkomorn bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Datadog Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lievan commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited

Loading

pr-commenter bot commented Sep 24, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Oct 22, 2024 •

edited

Loading