Developed binary and multiclass CNN models using histology images from 99 patients.
Accurate classification of cell nuclei in histopathology images is essential for early cancer diagnosis and pathology workflow efficiency. This project aims to automate the classification of colon cell nuclei using machine learning, with two core objectives:
1.Detect whether a nucleus is cancerous 2.Classify each nucleus into its corresponding medically relevant cell type
The goal is to reduce diagnostic workload and provide consistent, scalable support for pathology workflows.
Examined class distribution, staining patterns, and morphological differences Identified imbalance and variability across tissue samples Visualised representative image patches to understand dataset quality and noise
To avoid data leakage and ensure realistic evaluation, the dataset was split by patient, not by image. This prevents patches from the same patient appearing in both training and validation sets.
Applied image augmentations (flips, rotations, colour jittering) to improve generalisation. Implemented class-balancing strategies to address heavy imbalance across cell types. Built a reproducible preprocessing pipeline for loading and transforming patches.
Evaluated model performance using:
- Accuracy, Precision, Recall, and F1-score
- AUC for binary cancer detection
- Confusion matrices to assess per-class behaviour These metrics provide a robust understanding of both binary malignancy detection and multi-class cell type classification.
Developed a baseline convolutional neural network (CNN) tailored for small image patches:
- Convolution + pooling layers for feature extraction
- Fully connected layers for classification
- Softmax output for multi-class prediction
Identifing of overfitting/underfitting, and use optimization Techniques to address fitting issues (dropout, regularization, etc.).
- Final Model Accuracy: Clearly demonstrates achieving good performance aligned with established goals or benchmarks.
- Robustness and Generalizability: Demonstrates and discusses model robustness across different subsets or scenarios.