Skip to content

zxjrsch/llm-scratch

Repository files navigation

llm-scratch

Controlling every detail of LLM training, by building from the ground up.

Current Features

  1. Mixture of experts architecture defined in llm/moe.py. No optimizations yet.
  2. Loss functions (CE, DPO, GRPO/GSPO). TODO double check.
  3. Optimizer (non distributed) and learning rate scheduler (warmup, cosine annealing, post-annealing)

For Kernels

uv pip install nvidia-cutlass-dsl triton

Tokenizers

uv run maturin develop --release --manifest-path tokenizer/Cargo.toml 

About

Controlling every detail of LLM training, by building from the ground up.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published