Predicting Diabetes Health Status Using Various Machine Learning Models: A Comprehensive Dataset and Performance Analysis
Welcome to the 4AL3 Final Project Repository! This repository contains all the important data and files required for our final project for COMPSCI 4AL3 at McMaster University.
The objective of this project is to utilize CDC Diabetes Health Indicators data to predict an individual's health status as one of the following:
- Diabetic
- Pre-Diabetic
- Healthy
- Multiple machine learning models will be implemented and evaluated based on accuracy. We chose K-Nearest Neighbours, Neural Network, and SVM.
- Results from the models will be thoroughly analyzed.
- The goal is to identify the most efficient model for predicting health status.
- Data: Dataset derived from the UCI machine learning repository.
- Code: Scripts used for data preprocessing, model training, evaluation, and analysis.
- Results: Visualizations and metrics comparing model performance.
This project leverages:
- Python (for data manipulation and modeling)
- Machine learning libraries (SKLearn)
- Visualization tools (e.g., Matplotlib)
- Nicole Sorokin
- Julia Brzustowski
- Anniruddh Arora