Welcome to my Deep Learning Repository! This repository contains projects, tasks, and materials I’ve worked on during my deep learning journey, specifically through the Deep Learning Specialization by DeepLearning.AI on Coursera.
A special thanks to the author of the book Build a Large Language Model (From Scratch) by Sebastian Raschka.
This specialization consists of five fundamental courses:
- Neural Networks and Deep Learning
- Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization
- Structuring Machine Learning Projects
- Convolutional Neural Networks
- Sequence Models
As part of my deep learning journey, I've authored two articles published on Towards Data Science. These articles document my experiences and insights, translating complex theories into practical implementations:
-
From Theory to Practice: Building a Deep Feedforward Neural Network with Back Propagation in Python
In this article, I guide readers through the process of building a deep feedforward neural network from scratch, focusing on the backpropagation algorithm. It provides a step-by-step approach, blending theoretical concepts with practical Python implementation. -
Adam Optimization Demystified: Enhancing Multiclass MLP Performance
This article delves into the Adam optimization algorithm, explaining its mechanics and advantages. It also includes a hands-on example of how Adam can be used to enhance the performance of a multiclass MLP, offering readers both a theoretical and practical understanding.
In this repository, you'll find over 15 deep learning and LLM models that I developed during my learning journey. These include:
- Deep Multi-Layer Perceptron (MLP) models
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory (LSTM) networks
- Transformer model
- GPT2 models
Each model is implemented from scratch, with some utilizing TensorFlow or PyTorch for more advanced functionalities. During this journey, I’ve explored and implemented various optimization algorithms, including Gradient Descent (GD), Mini-Batch Gradient Descent, Stochastic Gradient Descent (SGD), and the Adam optimizer. I’ve also incorporated regularization techniques such as L2 Regularization, Dropout, and Learning Rate Decay, all built from scratch to understand their effects on model performance and generalization.
Feel free to explore the models and adapt them for your own projects and datasets.
Model | Description |
---|---|
Logistic_Regression_1 | A simple logistic regression model for image recognition, implemented from scratch using NumPy. This model is a basic classification model used as an introduction to deep learning concepts. |
Logistic_Regression_2 | A logistic regression model similar to the first but implemented using TensorFlow. This version takes advantage of TensorFlow's functionalities to streamline the model creation and training process for image recognition tasks. |
FFNN_1 | A shallow feedforward neural network (FFNN) with one hidden layer for Boolean classification tasks. Implemented from scratch using NumPy, this model serves as an introduction to neural networks. |
FFNN_2 | A deep feedforward neural network (FFNN), implemented from scratch using NumPy. This model is used for Boolean classification tasks and employs gradient descent as the optimization algorithm. It was trained on a cat dataset to predict whether an image contains a cat. |
FFNN_3 | A deep feedforward neural network (FFNN) similar to FFNN_2, but with multiple parameter initialization options, including He, Xavier, and Gaussian random variables. This model is designed to compare different initialization methods for Boolean classification tasks. |
FFNN_4 | A deep feedforward neural network (FFNN) with L2 regularization, implemented from scratch using NumPy. L2 regularization is employed to prevent overfitting. This model was trained on a synthetic dataset to evaluate the effectiveness of L2 regularization in improving generalization. |
FFNN_5 | A deep feedforward neural network (FFNN) with Adam optimization, mini-batch gradient descent, and stochastic gradient descent techniques. The model includes options for bias correction and dynamic learning rate adjustment. It was trained on both a synthetic 2D dataset and a cat image dataset, using the Adam optimizer to enhance convergence speed and performance. |
FFNN_6 | A deep feedforward neural network (FFNN) with dropout regularization, implemented from scratch using NumPy. Dropout is used to prevent overfitting and improve generalization in Boolean classification tasks. |
FFNN_7 | A deep feedforward neural network (FFNN) implemented from scratch using NumPy, with Adam optimization for multiclass classification tasks. The model includes learning rate decay and L2 regularization techniques and was trained on a dataset of hand-sign images to classify the numbers each image represents. |
CNN_1 | A convolutional neural network (CNN) implemented using TensorFlow. The structure of the model includes layers like Conv2D, MaxPooling2D, and FullyConnected layers. The model can be modified to be larger or smaller, and was trained on a hand-sign image dataset. |
CNN_2 | A convolutional neural network (CNN) implemented from scratch using NumPy. The model structure includes Conv2D, MaxPooling2D, and FullyConnected layers. This model is designed to provide a deeper understanding of how CNNs work under the hood by implementing all components manually. |
ResNet_50 | An implementation of a very deep convolutional neural network using Residual Networks (ResNet50), based on the paper by K. He et al. (2015). The model is implemented using TensorFlow and is designed to address the vanishing gradient problem in deep networks. |
U-Net | An implementation of the U-Net architecture, based on the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation" by O. Ronneberger et al. (2015). This model is implemented using TensorFlow and is designed for image segmentation tasks, particularly in the biomedical field. |
RNN | A recurrent neural network (RNN) implemented using TensorFlow. This model is designed for sequence prediction tasks and includes methods for initializing parameters, performing forward passes through the RNN cells, and training the model with gradient descent and Adam optimization. It demonstrates how to handle time-series data and learn temporal patterns. |
LSTM | An implementation of a Long Short-Term Memory (LSTM) network for sequence prediction tasks. The model is designed to handle time-series data and learn temporal dependencies. It includes methods for initializing parameters, performing forward passes through LSTM cells, and training with gradient descent and Adam optimization. |
Transformer | Implementation of the original Transformer model from the paper Attention is All You Need by Vaswani et al., built from scratch using PyTorch. |
GPT2 | GPT-2 (decoder-based LLM) implemented from scratch using PyTorch. |
This project is licensed under the MIT LICENSE