Inspired by Sebastian Raschka's book, "Build a Large Language Model (From Scratch)," this repository provides a practical demonstration of building LLMs from the ground up. It covers key aspects of the Transformer architecture and the intricacies involved in building and training your own LLMs.
Basic knowledge of Python, Machine Learning, Neural Networks, and Large Language Models is required.
Understanding the fundamentals of tokenization. A tokenizer is a component that splits text into smaller units (tokens). This file downloads "the-verdict.txt" to read all words and prepare vocabulary. Demonstrates how to create tokens from simple sentences and large texts using delimiter characters like spaces.
Implements the SimpleTokenizerV1 class with two methods: encode and decode. The encode method splits words in text into tokens and returns token IDs (numbers). The decode method converts a list of token IDs back into the original text.
Implements SimpleTokenizerV2, which is more capable than SimpleTokenizerV1. It handles two additional tokens: <|unk|> for words missing from the vocabulary and <|endoftext|> as a separator between sentences.
Uses tiktoken to create a refined tokenizer. Experience how this tokenizer is more advanced compared to the previously created SimpleTokenizerV1 and SimpleTokenizerV2.
Implements GPTDatasetV1 to accept text for dataset creation, tokenizer, and maximum length for chunks. Creates a dataset consisting of input and target tensor chunks.
A quick walkthrough of creating embeddings from simple vector data using PyTorch.
Creates a dataset by reading "the-verdict.txt" file using create_dataloader_v1 from e_data_preparation.py. Then creates token embeddings and positional embeddings.
This file will be downloaded as part of code execution in later files. Please ignore this file for now.
Defines a PyTorch tensor (matrix) representing 6 input elements (rows), each having a 3-dimensional feature vector. These rows represent word embeddings for each word. Takes the second element as a "query" and calculates attention scores, attention weights, and context vectors step by step.
Implements simple self-attention using PyTorch in a compact form.
Implements self-attention using linear layers. Performs the following steps: a) calculates attention weights, b) applies masking, c) applies normalization, d) applies negative infinity masking, e) recalculates attention weights, f) applies dropout.
Implements CausalAttention class. Various steps (a-f) performed in j_linear_self_attention.py are accommodated within the self-attention class using PyTorch modules.
Implements the MultiHeadAttentionWrapper class. Defines the number of num_heads and iterates through a loop to add CausalAttention instances to a ModuleList. Essentially, MultiHeadAttentionWrapper is a collection of CausalAttention modules based on the num_heads value.
Implements an efficient multi-head attention class called MultiHeadAttention.
Implements DummyGPTModel class, which serves as the basic skeleton of a GPT model.
Initializes the DummyGPTModel created in n_dummy_gpt_model.py and generates output by calling this model with simple inputs.
Implements and explains the LayerNorm, GELU, and FeedForward classes.
Implements ExampleDeepNeuralNetwork to understand simple deep neural networks. Neural networks are important building blocks of the Transformer architecture.
Implements the TransformerBlock class using MultiHeadAttention from m_efficient_multi_head_attention.py, FeedForward, LayerNorm from p_layernorm_gelu_feedforward.py, and Dropout.
Implements the complete GPTModel class properly.
Pretrains the GPTModel with very basic data using a couple of input and target examples. This provides a basic understanding of large language model pretraining.
Pretrains the GPTModel with proper data created using text from the "the-verdict.txt" file.
Downloads weights from a GPT-2 model and loads them into our GPTModel. You can choose from the following model sizes based on available GPU power:
- gpt2-small (124M parameters)
- gpt2-medium (355M parameters)
- gpt2-large (774M parameters)
- gpt2-xl (1558M parameters)
Fine-tunes the GPTModel for classification tasks such as spam/not-spam detection.
Fine-tunes the GPTModel with instruction data to create an instruction-following model. The goal is to train the model to follow user instructions effectively.
Returns to basics and reviews fundamental neural network concepts.
Fine-tunes the model using Low-Rank Adaptation (LoRA) technique. LoRA is a parameter-efficient fine-tuning method that freezes the original LLM weights and introduces a small number of trainable rank-decomposition matrices to adapt the model to specific tasks, significantly reducing training costs and computational requirements.
