[Feature] Support per-sample logging in `evaluate`

### Feature request

It would be useful to log the prompt, response, and evaluation metric associated with every sample. Currently there's no parameter a user can set which allows this to happen across evaluation backends.

### Motivation / references

LM Harness supports sample logging:
https://github.com/oumi-ai/oumi/pull/1496

### Your contribution

Submitted PR for minor fix: https://github.com/oumi-ai/oumi/pull/1496

But we should add this as a first-class parameter which works across all evaluators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support per-sample logging in `evaluate` #1497

Feature request

Motivation / references

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support per-sample logging in evaluate #1497

Description

Feature request

Motivation / references

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature] Support per-sample logging in `evaluate` #1497