Resuming training from a state that was resumed training from earlier state behaves weird

When resuming training from a training state that was also created from resuming training from a training state behaves weird.

Example: 5 epochs. Save state every epoch.  
* Blue line: Normal training from start to finish.
* Red line: Resume from state of epoch 2. `resume = "E:/training/output/test_1-000002-state"`
* yellowline: Resumed from first saved state (epoch 3) of the previously resumed training `resume = "E:/training/output/test_2-000003-state"`
<img width="356" height="437" alt="Image" src="https://github.com/user-attachments/assets/9489be2a-cb2b-42e9-a9d5-dacbc9e91bda" />

The first resumed training (redline) trains for 3 epochs and finishes at total of 2+3=5 epochs as expected. 
The resumed-resumed training (yellowline) trains for 4 epochs resulting in of 2+1+4=7 epochs of training.

This may be as simple as the `"current_step"` being saved with wrong number in train_state.json. But I am not good enough to know if that is the problem.

Used training settings: [training_tomls.zip](https://github.com/user-attachments/files/21706249/training_tomls.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Resuming training from a state that was resumed training from earlier state behaves weird #2171

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Resuming training from a state that was resumed training from earlier state behaves weird #2171

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions