llm-scratch

Controlling every detail of LLM training, by building from the ground up.

Current Features

Mixture of experts architecture defined in llm/moe.py. No optimizations yet.
Loss functions (CE, DPO, GRPO/GSPO). TODO double check.
Optimizer (non distributed) and learning rate scheduler (warmup, cosine annealing, post-annealing)

uv pip install nvidia-cutlass-dsl triton

uv run maturin develop --release --manifest-path tokenizer/Cargo.toml

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
blog		blog
csrc		csrc
inference		inference
kernel		kernel
llm		llm
loss		loss
operations		operations
optimizer		optimizer
peft		peft
profiler		profiler
tokenizer		tokenizer
.gitignore		.gitignore
README.md		README.md
ffi_extension_examples.py		ffi_extension_examples.py
pyproject.toml		pyproject.toml