Solutions for assignments and project of machine learning course at Sharif university of Technology (CE-477)
-
In this assignment, I worked on Heart_Disease dataset. Topics I used:
- EDA
- Perceptron
- Naive Bayes
Also these basic topics used:
- Confusion Matrix
- F1-score
- Recall_score
- Precision_score
For reading the details quickly you can see this pdf.
For reading the notebook you can see this link. -
This exercise will help you gain a deeper understanding of, and insights into, Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation.
For reading the details quickly you can see this pdf.
For reading the notebook you can see this link. -
This exercise explores polynomial regression, a form of regression analysis where the relationship between the independent variable ( X ) and the dependent variable ( y ) is modeled as an ( n )th degree polynomial. We will create a synthetic dataset, train models with varying degrees of polynomials, and evaluate their performance on different test sets.
Steps:
- Create a synthetic dataset
- Splitting the Dataset
- Polynomial Regression Training
- Model Evaluation
- Plotting Model Scores
For reading the details quickly you can see this pdf.
For reading the notebook you can see this link. -
In this assignment, we will work with a dataset that includes The Boston housing data was collected in 1978 and each of the 506 entries represent aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts. First, we will start by fitting a basic regression model using scikit-learn (sklearn) to establish a baseline for comparison. This basic regression model will serve as a reference point for evaluating the performance of more sophisticated models incorporating regularization techniques. Furthermore, we will apply L1 (Lasso) and L2 (Ridge) regularization techniques to refine our predictions and evaluate the impact of these methods on the accuracy of our results.
Topics:
- L1 (Lasso) regularization
- L2 (Ridge) regularization
For reading the details quickly you can see this pdf.
For reading the notebook you can see this link. -
This notebook applies KMeans clustering on a dataset using both Elbow Method and Silhouette Method to determine the optimal number of clusters. The project compares the performance of a custom KMeans implementation with the one from Sklearn.
Methods:
- Elbow Method: Focuses on minimizing WCSS (within-cluster sum of squares) to identify the point where adding more clusters doesn't significantly improve results.
- Silhouette Method: Evaluates cluster quality by measuring how well points fit within their own cluster vs. other clusters. A higher silhouette score indicates better-defined clusters.
Results:
- Elbow Method: Optimal number of clusters is suggested as 3 or 4.
- Silhouette Method: Optimal number of clusters is 2 based on the highest silhouette scores.
- Silhouette method is preferred due to its deterministic nature and higher precision.
For reading the notebook you can see this link.
-
This notebook is focused on performing a classification task on a bank marketing dataset using PyTorch. Below is a summary of the key steps and components of the notebook.
The following libraries are used for data manipulation, machine learning, and neural network construction:
- PyTorch:
torch
,torch.nn
,torch.optim
- Data Processing:
pandas
,numpy
,sklearn
- Visualization:
matplotlib
- Data Handling:
TensorDataset
,DataLoader
The dataset used is the Bank Marketing Dataset, which is loaded from a CSV file. It contains various features that describe customer information and whether they subscribed to a bank product.
- Train-Test Split: The data is split into train, validation, and test sets using
train_test_split
fromsklearn
. - Feature Scaling: Continuous variables such as age, balance, and duration are normalized using
StandardScaler
. - Encoding Categorical Variables: One-hot encoding is applied to categorical features (job, marital status, etc.) using
pandas.get_dummies
, and the target label (y
) is encoded usingLabelEncoder
.
For reading the notebook you can see this link.
- PyTorch:
-
This project demonstrates various techniques for dimensionality reduction applied to a dataset. The goal is to reduce the number of dimensions while preserving as much relevant information as possible. This can help in visualization and improving the performance of machine learning models.
- Data Preprocessing
- The dataset (
nutrition.csv
) is loaded usingPandas
. - Only numeric columns are selected for analysis.
- Data is scaled using
StandardScaler
to normalize the feature values.
- Dimensionality Reduction Techniques
- PCA (Principal Component Analysis): This method reduces the dimensionality by transforming the original variables into a smaller set of new variables (principal components), which capture the most variance.
- ICA (Independent Component Analysis): A technique that focuses on making the components as statistically independent as possible, useful for separating mixed signals.
- t-SNE (t-distributed Stochastic Neighbor Embedding): A non-linear technique mainly used for the visualization of high-dimensional data. It maps multi-dimensional data to a two or three- dimensional space.
- Visualization
- Visualizations are generated using
matplotlib
andseaborn
to compare results and understand the structure of the data after applying each technique.
For reading the notebook you can see this link.
-
This notebook demonstrates a complete, step-by-step implementation of the K-Nearest Neighbors (KNN) algorithm from scratch. It covers key concepts, code implementation, and model evaluation.
- Understand the basics of the KNN algorithm.
- Implement the KNN algorithm without using specialized libraries.
- Evaluate the model's performance on test data.
- Visualize the data and results to understand KNN behavior.
- Import Libraries
- Define Distance Function (e.g., Euclidean)
- Implement KNN Function from Scratch
- Data Preparation
- Model Evaluation
- Data and Results Visualization
For reading the notebook you can see this link. For reading the complete README you can see here.
-
In this assignment, I implemented SVM (Support Vector Machines) for classification.
- Data Preprocessing
- Model
- Evaluation
- Fine-tuning
- Multiclass SVM
- Different SVM Kernels:
- Linear Kernel
- Gaussian RBF Kernel
- Polynomial Kernel
- Sigmoid Kernel
For reading the notebook you can see this link.