[Feature Request] Add support for raising evaluation result errors locally

[The LangSmith docs say](https://docs.smith.langchain.com/evaluation/how_to_guides/pytest):

> The LangSmith pytest plugin lets Python developers define their datasets and evaluations as pytest test cases. Compared to the evaluate() evaluation flow, this is useful when:
> 
> Each example requires different evaluation logic
> You want to assert binary expectations, and both track these assertions in LangSmith and raise assertion errors locally (e.g. in CI pipelines)
> ...

The part about raising assertion errors locally (e.g. in CI pipelines) is particularly of interest, and is currently not supported by `langsmith-evaluation-helper`. At the moment, the runs will succeed even if there are fatal errors in evaluation results - this means that this will not catch possible regressions if used as the primary interface for running CI evals on main branch changes.

It would be great if there was a nice, clean way to write a `pytest` test that is a simple wrapper around an eval config that can then notify of any evaluation result failures, or make specific assertions.

This would mean that pytest can be used exclusively for assertions on the results, whilst allowing for all of the management of evaluations to be easily managed from config files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add support for raising evaluation result errors locally #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add support for raising evaluation result errors locally #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions