Skip to content

Add argument to set number of eval steps in Trainer #31561

@brianhill11

Description

@brianhill11

Feature request

I would like to add an argument to the Trainer class that allows for setting the number of eval steps (batches) for the evaluation procedure during training.

The current behavior is to fully iterate through the entire dataset provided by the eval_dataset argument. However, when using a very large evaluation dataset, the evaluation process can take a long time. Especially for debugging, it would be helpful to be able to explicitly specify how many batches you would like to use as part of the evaluation procedure.

Motivation

Currently, I am using dataset streaming to train and evaluate models (motivated by large dataset size). My eval_strategy argument is set to "steps" since I'm using the streaming dataset, and so the Trainer runs the evaluation procedure every eval_steps number of steps. However, because there is no way to control how many steps are run during the evaluation procedure, the Trainer iterates through the entire eval dataset, which can be very time-consuming for a large dataset. Ideally, we could specify how many evaluation steps to run inside the evaluation procedure.

Note that the argument I would like to add is different than the current eval_steps argument -- the current eval_steps argument specifies "Number of update steps between two evaluations" but what I would like to specify is the number of steps within the evaluations.

Your contribution

I would be happy to contribute code for this if this is deemed a relevant feature request and there isn't existing functionality that I'm unaware of.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions