This is the repository for training/infering an Masked Diffusion Model in a GPT-style based on nanoGPT.
pip install torch numpy transformers datasets tiktoken wandb tqdm
Dependencies:
- pytorch <3
- numpy <3
transformers
for huggingface transformers <3 (to load GPT-2 checkpoints)datasets
for huggingface datasets <3 (if you want to download + preprocess OpenWebText)tiktoken
for OpenAI's fast BPE code <3wandb
for optional logging <3tqdm
for progress bars <3
bash submit_data_preprocess.sh
This bash script is to download and preprocess the necessary datasets (OpenWebText, Wikitext, 1BW, LAMBADA, etc.) before training.
Train a GPT-2 Small scale model.
bash submit_124M_train.sh
Train a GPT-2 Medium scale model.
bash submit_350M_train.sh
My pretrained checkpoints for AO-GPT (Small, Medium) and Sigma-GPT (Small, Medium) are hosted on Hugging Face at Cauthyyy/AO-GPT-MDM.
Model | Link |
---|---|
AO-GPT-Small | Link |
AO-GPT-Medium | Link |
Sigma-GPT-Small | Link |
Sigma-GPT-Medium | Link |
bash sample_AOGPT.sh
Try different sampling steps, Top-p, and temperature settings!
This repo is heavily built on nanoGPT.