Skip to content

Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!

Notifications You must be signed in to change notification settings

scxue/AO-GPT-MDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AO-GPT-MDMD

This is the repository for training/infering an Masked Diffusion Model in a GPT-style based on nanoGPT.

install

pip install torch numpy transformers datasets tiktoken wandb tqdm

Dependencies:

  • pytorch <3
  • numpy <3
  • transformers for huggingface transformers <3 (to load GPT-2 checkpoints)
  • datasets for huggingface datasets <3 (if you want to download + preprocess OpenWebText)
  • tiktoken for OpenAI's fast BPE code <3
  • wandb for optional logging <3
  • tqdm for progress bars <3

data preprocessing

bash submit_data_preprocess.sh

This bash script is to download and preprocess the necessary datasets (OpenWebText, Wikitext, 1BW, LAMBADA, etc.) before training.

Train an AO-GPT

Train a GPT-2 Small scale model.

bash submit_124M_train.sh

Train a GPT-2 Medium scale model.

bash submit_350M_train.sh

Pretrained Checkpoint

Pretrained Checkpoints

My pretrained checkpoints for AO-GPT (Small, Medium) and Sigma-GPT (Small, Medium) are hosted on Hugging Face at Cauthyyy/AO-GPT-MDM.

Model Link
AO-GPT-Small Link
AO-GPT-Medium Link
Sigma-GPT-Small Link
Sigma-GPT-Medium Link

sampling / inference

bash sample_AOGPT.sh

Try different sampling steps, Top-p, and temperature settings!

acknowledgements

This repo is heavily built on nanoGPT.

About

Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture. Training an MDM using GPT with this repo!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published