Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

train opt-125M from scratch #725

Open
@emrecanacikgoz

Description

@emrecanacikgoz

I couldn't find a detailed documentation (or a step-by-step guideline) about pre-training opt125 with exactly the same corpus and model architecture that you used in paper. In short, I would like to reproduce your smallest model results, from scratch.

Could you point out where can find the regarding guideline or provide anything else which can help? Special thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions