Skip to content

Lightweight LLM inspired by Qwen3, built from scratch in PyTorch. Full training pipeline with transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU layers. Trained with hybrid Muon + AdamW optimizer, causal masking, efficient batching, and evaluation tools.

Notifications You must be signed in to change notification settings

petermartens98/Qwen3-LLM-Pytorch-Implementation-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Qwen3-LLM-Pytorch-Implementation-From-Scratch

For Full Code Navigate in this Repo to Jupyter Notebook: qwen3_llm_implementation_from_scratch.ipynb

Project Overview

  • Lightweight LLM inspired by Qwen3, built from scratch in PyTorch.
  • Implements modern transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU feed-forward layers.
  • Trained using a hybrid Muon + AdamW optimizer setup with causal masking, efficient batching, and evaluation utilities.
  • Includes full training pipeline, model loading, and interactive text generation demos for hands-on experimentation.

Step by Step Overview (Table of Contents)

  1. Imports
  2. Utility Functions (set_seed, ...)
  3. Model Configuration
  4. Key/Value Head Expansion Function
  5. Muron Optimizer (Orthogonalized Momentum via Newton–Schulz)
  6. Data Loading and Caching
  7. TextTokenDataset Class
  8. Rotary Position Embeddings (RoPE)
  9. Grouped-Query Attention (GQA)
  10. SwiGLU Feed-Forward Network (FFN)
  11. Transformer block (attention + FFN + RMSNorm + residuals)
  12. Language model class (MinimalLLM)
  13. Evaluation function (loss, accuracy, perplexity)
  14. Optimizer setup (hybrid Muon + AdamW)
  15. Training loop (AMP, grad accumulation, schedulers)
  16. Training Script
  17. Model Loading
  18. Model Inference - Autoregressive Text Generation and Chat Interactive Inference.

Useful Materials

About

Lightweight LLM inspired by Qwen3, built from scratch in PyTorch. Full training pipeline with transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU layers. Trained with hybrid Muon + AdamW optimizer, causal masking, efficient batching, and evaluation tools.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published