Skip to content

[Feature Request] Add support for raising evaluation result errors locally #33

@m-roberts

Description

@m-roberts

The LangSmith docs say:

The LangSmith pytest plugin lets Python developers define their datasets and evaluations as pytest test cases. Compared to the evaluate() evaluation flow, this is useful when:

Each example requires different evaluation logic
You want to assert binary expectations, and both track these assertions in LangSmith and raise assertion errors locally (e.g. in CI pipelines)
...

The part about raising assertion errors locally (e.g. in CI pipelines) is particularly of interest, and is currently not supported by langsmith-evaluation-helper. At the moment, the runs will succeed even if there are fatal errors in evaluation results - this means that this will not catch possible regressions if used as the primary interface for running CI evals on main branch changes.

It would be great if there was a nice, clean way to write a pytest test that is a simple wrapper around an eval config that can then notify of any evaluation result failures, or make specific assertions.

This would mean that pytest can be used exclusively for assertions on the results, whilst allowing for all of the management of evaluations to be easily managed from config files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions