This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
train opt-125M from scratch #725
Open
Description
I couldn't find a detailed documentation (or a step-by-step guideline) about pre-training opt125 with exactly the same corpus and model architecture that you used in paper. In short, I would like to reproduce your smallest model results, from scratch.
Could you point out where can find the regarding guideline or provide anything else which can help? Special thanks,