Skip to content

YULINHEEE/Colon-Cancer-Histopathology-Image-Classification

Repository files navigation

Automated Detection & Cell-Type Classification Using Deep Learning

Developed binary and multiclass CNN models using histology images from 99 patients.

Accurate classification of cell nuclei in histopathology images is essential for early cancer diagnosis and pathology workflow efficiency. This project aims to automate the classification of colon cell nuclei using machine learning, with two core objectives:

1.Detect whether a nucleus is cancerous 2.Classify each nucleus into its corresponding medically relevant cell type

The goal is to reduce diagnostic workload and provide consistent, scalable support for pathology workflows.

Model Development Approach

1. Exploratory Data Analysis (EDA)**

Examined class distribution, staining patterns, and morphological differences Identified imbalance and variability across tissue samples Visualised representative image patches to understand dataset quality and noise

2. Patient-Level Data Splitting

To avoid data leakage and ensure realistic evaluation, the dataset was split by patient, not by image. This prevents patches from the same patient appearing in both training and validation sets.

3. Data Pipeline with Augmentation & Class Balancing

Applied image augmentations (flips, rotations, colour jittering) to improve generalisation. Implemented class-balancing strategies to address heavy imbalance across cell types. Built a reproducible preprocessing pipeline for loading and transforming patches.

4. Performance Metrics Selection

Evaluated model performance using:

  • Accuracy, Precision, Recall, and F1-score
  • AUC for binary cancer detection
  • Confusion matrices to assess per-class behaviour These metrics provide a robust understanding of both binary malignancy detection and multi-class cell type classification.

5. Baseline Deep Learning Model — Custom CNN

Developed a baseline convolutional neural network (CNN) tailored for small image patches:

  • Convolution + pooling layers for feature extraction
  • Fully connected layers for classification
  • Softmax output for multi-class prediction

6. Model Optimization

Identifing of overfitting/underfitting, and use optimization Techniques to address fitting issues (dropout, regularization, etc.).

7. Model Performance and Robustness.

  • Final Model Accuracy: Clearly demonstrates achieving good performance aligned with established goals or benchmarks.
  • Robustness and Generalizability: Demonstrates and discusses model robustness across different subsets or scenarios.

About

Developed binary and multiclass CNN models using histology images from 99 patients.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors