-
Notifications
You must be signed in to change notification settings - Fork 2
Description
The LangSmith pytest plugin lets Python developers define their datasets and evaluations as pytest test cases. Compared to the evaluate() evaluation flow, this is useful when:
Each example requires different evaluation logic
You want to assert binary expectations, and both track these assertions in LangSmith and raise assertion errors locally (e.g. in CI pipelines)
...
The part about raising assertion errors locally (e.g. in CI pipelines) is particularly of interest, and is currently not supported by langsmith-evaluation-helper
. At the moment, the runs will succeed even if there are fatal errors in evaluation results - this means that this will not catch possible regressions if used as the primary interface for running CI evals on main branch changes.
It would be great if there was a nice, clean way to write a pytest
test that is a simple wrapper around an eval config that can then notify of any evaluation result failures, or make specific assertions.
This would mean that pytest can be used exclusively for assertions on the results, whilst allowing for all of the management of evaluations to be easily managed from config files.