Allow resuming vf-eval #557
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds support for resuming an evaluation run via
vf-evalusing the-R(--resume-from-path) flag which is supposed to be used in conjunction with-sand-fto allow to resume an evaluation run. It will:example_idsfromresults.jsonlin the specified resume pathresults.jsonland update the metadata (e.g. timings are summed, avg. reward and metrics are averaged)Notes:
rollouts_per_example=1Examples
Default behavior is unchanged
To use the resume feature and save each finished group, specify
-sand-f 1. To illustrate, I sys.exit after every intermediate saving. For the initial command I run:Then, each subsequent resume from the output dir
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes