[Feature Request] Add support for running a pairwise experiment

Currently, all evaluations are run in isolation. However, it is often beneficial to compare the results of against another evaluation.

LangSmith supports [evaluating existing experiments in a comparative manner](docs.smith.langchain.com/evaluation/how_to_guides/evaluate_pairwise). Under the hood, this can be achieved via the SDK by using [aevaluate()](https://docs.smith.langchain.com/reference/python/evaluation/langsmith.evaluation._arunner.aevaluate) with two existing experiments.

I'm not quite sure of the best way to implement this via the config file. I was thinking that it could be something that is performed after a new experiment has been created if an experiment to compare against is provided via the config file - i.e. create a new experiment and then create a pairwise evaluation for it against a base. It could also be that the config has an option for "compare to previous matching experiment" and performs a look-up of the last evaluation that matches the prefix, for example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add support for running a pairwise experiment #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add support for running a pairwise experiment #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions