Open
Description
@lessw2020 Thanks for this awesome optimizer. I´m very excited about it!
There is one particular workload that trains using a batch of 1 item.
Theoretically, make sense to use RAdam (Rectified Adam), LookAhead, and GC in this context?
I´m thinking about it, read the papers but I still could not make a conclusion. As you (or any other person here) is much more experienced than me, do you have an option on this?
Metadata
Metadata
Assignees
Labels
No labels