For Full Code Navigate in this Repo to Jupyter Notebook: qwen3_llm_implementation_from_scratch.ipynb
- Lightweight LLM inspired by Qwen3, built from scratch in PyTorch.
- Implements modern transformer components including RMSNorm, Rotary Position Embeddings (RoPE), Grouped-Query Attention (GQA), and SwiGLU feed-forward layers.
- Trained using a hybrid Muon + AdamW optimizer setup with causal masking, efficient batching, and evaluation utilities.
- Includes full training pipeline, model loading, and interactive text generation demos for hands-on experimentation.
- Imports
- Utility Functions (set_seed, ...)
- Model Configuration
- Key/Value Head Expansion Function
- Muron Optimizer (Orthogonalized Momentum via Newton–Schulz)
- Data Loading and Caching
- TextTokenDataset Class
- Rotary Position Embeddings (RoPE)
- Grouped-Query Attention (GQA)
- SwiGLU Feed-Forward Network (FFN)
- Transformer block (attention + FFN + RMSNorm + residuals)
- Language model class (MinimalLLM)
- Evaluation function (loss, accuracy, perplexity)
- Optimizer setup (hybrid Muon + AdamW)
- Training loop (AMP, grad accumulation, schedulers)
- Training Script
- Model Loading
- Model Inference - Autoregressive Text Generation and Chat Interactive Inference.
- Qwen 3 Technical Report PDF: https://arxiv.org/pdf/2505.09388
- Qwen 3 GitHub Repo: https://github.com/QwenLM/Qwen3