Skip to content

[Feature] Support per-sample logging in evaluate #1497

@jgreer013

Description

@jgreer013

Feature request

It would be useful to log the prompt, response, and evaluation metric associated with every sample. Currently there's no parameter a user can set which allows this to happen across evaluation backends.

Motivation / references

LM Harness supports sample logging:
#1496

Your contribution

Submitted PR for minor fix: #1496

But we should add this as a first-class parameter which works across all evaluators.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions