train opt-125M from scratch

I couldn't find a detailed documentation (or a step-by-step guideline) about pre-training opt125 with exactly the same corpus and model architecture that you used in paper. In short, I would like to reproduce your smallest model results, from scratch. 

Could you point out where can find the regarding guideline or provide anything else which can help? Special thanks,