This repository contains the implementation of a machine learning project aimed at enhancing the accuracy of predicting Pathological Complete Response (PCR) and Relapse-Free Survival (RFS) in chemotherapy-treated breast cancer patients. The project analyzes a dataset with clinical and MRI features across 400 patients to provide additional prognosis information for personalized treatment recommendations.
This project aims to improve the predictions of PCR and RFS in breast cancer patients using various machine learning techniques. Accurate predictions can guide treatment decisions and provide better prognostic information.
The dataset used in this project is publicly available from The American College of Radiology Imaging Network.
Feature Type | Number of Features |
---|---|
Clinical | 10 |
MRI-based | 107 |
Total Records | 400 |
- Handling Missing Values: Rows with missing values denoted by '999' were removed. Median imputation was used for remaining missing values.
- Feature Selection: ANOVA was used to assess the importance of each feature. Features with an f-score above 3.5 were selected.
Model Type | Model |
---|---|
Classification | Logistic Regression |
AdaBoost Classifier | |
Support Vector Machine (SVM) | |
Voting Classifier |
Model Type | Model |
---|---|
Regression | Random Forest Regressor |
Support Vector Regressor (SVR) | |
LASSO Regressor | |
Ridge Regressor | |
AdaBoost Regressor |
Evaluation Type | Metric |
---|---|
Classification | Balanced Accuracy |
Precision | |
Regression | Mean Absolute Error (MAE) |
Tuning | Grid Search |
The two best results from classification and regression task was evaluated.
Model | Balanced Accuracy | Precision |
---|---|---|
AdaBoost Classifier | 73.43% | 70.59% |
Voting Classifier | 70.68% | 83.33% |
Model | Mean Absolute Error (MAE) |
---|---|
Random Forest Regressor | 20.43 |
LASSO Regressor | 20.84 |
The AdaBoost Classifier was the best model for predicting PCR due to its balanced accuracy and precision. The Random Forest Regressor was the most effective for predicting RFS with the lowest MAE and robust performance.