Skip to content

Allow load_best_model_at_end=True to work when save_steps < eval_steps and best model is saved #39476

@KrisTHL181

Description

@KrisTHL181

Feature request

Allow load_best_model_at_end=True to work even when save_steps is not a round multiple of eval_steps, and optionally preserve the best model even when reaching save_total_limit.

This change would remove the current restriction that enforces save_steps to be a multiple of eval_steps when load_best_model_at_end=True. Additionally, it proposes an optional flag to prevent deletion of the best model when the total number of saved checkpoints exceeds the limit.

No specific paper is associated with this feature. This is a usability improvement based on common user workflows and constraints.

Motivation

Users with limited disk space (e.g., Colab users) often want to:

  • Save more frequently (e.g., save_steps=100) to avoid losing progress
  • Evaluate less frequently (e.g., eval_steps=200) to save compute
  • Still be able to load the best model at the end using load_best_model_at_end=True

Currently, this is not possible unless save_steps is a multiple of eval_steps, which is unnecessarily restrictive. The restriction could be lifted by simply ensuring that the best model is saved at least once during training, regardless of the save/eval frequency ratio.

Additionally, users may want to keep the best model even when reaching the save_total_limit, which currently may cause the best model to be deleted.

This request is related to the discussion in Hugging Face Transformers GitHub issues, where users have reported frustration over this limitation.

Your contribution

Although I currently lack the resources to submit a PR myself, I'm happy to support the discussion and help refine the proposal. I believe contributions go beyond code — asking questions, sharing feedback, and helping others in the community are also valuable ways to contribute.
I encourage others who are interested in this feature to join the discussion or take up the implementation. I'm also happy to test or provide input if someone decides to work on it.
In the meantime, I’ll continue to support the project by spreading the word and showing appreciation for the library’s impact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions