Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved

### Feature request

Allow load_best_model_at_end=True to work even when save_steps is not a round multiple of eval_steps, and optionally preserve the best model even when reaching save_total_limit. 

This change would remove the current restriction that enforces save_steps to be a multiple of eval_steps when load_best_model_at_end=True. Additionally, it proposes an optional flag to prevent deletion of the best model when the total number of saved checkpoints exceeds the limit. 

No specific paper is associated with this feature. This is a usability improvement based on common user workflows and constraints. 

### Motivation

Users with limited disk space (e.g., Colab users) often want to:
- Save more frequently (e.g., `save_steps=100`) to avoid losing progress
- Evaluate less frequently (e.g., `eval_steps=200`) to save compute
- Still be able to load the best model at the end using `load_best_model_at_end=True`

Currently, this is not possible unless `save_steps` is a multiple of `eval_steps`, which is unnecessarily restrictive. The restriction could be lifted by simply ensuring that the best model is saved at least once during training, regardless of the save/eval frequency ratio.

Additionally, users may want to keep the best model even when reaching the `save_total_limit`, which currently may cause the best model to be deleted.

This request is related to the discussion in Hugging Face Transformers GitHub issues, where users have reported frustration over this limitation.


### Your contribution

Although I currently lack the resources to submit a PR myself, I'm happy to support the discussion and help refine the proposal. I believe contributions go beyond code — asking questions, sharing feedback, and helping others in the community are also valuable ways to contribute. 
I encourage others who are interested in this feature to join the discussion or take up the implementation. I'm also happy to test or provide input if someone decides to work on it. 
In the meantime, I’ll continue to support the project by spreading the word and showing appreciation for the library’s impact. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved #39476

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow load_best_model_at_end=True to work when save_steps < eval_steps and best model is saved #39476

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Allow `load_best_model_at_end=True` to work when `save_steps < eval_steps` and best model is saved #39476