Skip to content

Can't continue training with cosine scheduler #2910

Answered by AlienRenders
AlienRenders asked this question in Q&A
Discussion options

You must be logged in to vote

Ok, I found a workaround.

So if you want to use cosine with restarts, set max train epoch and max train steps both to 0.
Set epoch to the epoch you are generating.
And set LR cycles as if you're generating ALL the epochs. So if you're generating one epoch at a time and you want to restart the cosine every epoch, set this value to the epoch number. So if epoch is 3, set LR cycles to 3 as well.

So far, this is normal. But the initial_step value will be wrong on the third epoch (or second time continuing). To get around this, calculate what the start step number should be. If you have 500 iterations per epoch and you're on the 3rd epoch, then the start iteration number is 1000.

In "additiona…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by AlienRenders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant