Skip to content

AlvaroVasquezAI/People_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


People Detection Using Artificial Neural Networks

An end-to-end computer vision project that uses classical image feature extraction and an Artificial Neural Network (ANN) to detect the presence of people in urban environments.

Python scikit-learn OpenCV scikit-image NumPy Pandas Matplotlib

Project Showcase

Introduction

In the realm of computer vision, detecting human figures within complex environments is a pivotal challenge. This project presents a comprehensive implementation of an Artificial Neural Network (ANN) designed specifically for detecting people within urban scenes—a critical application in areas like surveillance, autonomous driving, and pedestrian tracking.

The core of this project lies in the development of a sophisticated feature extraction pipeline and the training of a neural network model that can accurately classify image segments. By leveraging a custom dataset of AI-generated urban scenarios, we refine the model's ability to generalize across different settings and conditions. This involves a detailed workflow from manual data labeling and advanced image processing to model training, hyperparameter tuning, and robust validation.

This repository documents the entire process, including the custom-built data labeling tool, the feature extraction logic, the model training methodology, and a powerful visualization script that demonstrates the model's classification performance on full-scale images.

Table of Contents

  1. Key Features
  2. Technology Stack
  3. Project Workflow
  4. Getting Started
  5. Usage
  6. Results
  7. License

Key Features

  • Custom Data Labeling Tool: A graphical user interface (GUI) built with Tkinter to facilitate the rapid, manual labeling of image grids. The tool automatically segments large images, saves classified grids into structured folders, and generates a corresponding CSV log.
  • Comprehensive Feature Extraction: A robust pipeline that extracts a rich set of features from each image grid to create a powerful feature vector for the machine learning model.
  • ANN for Classification: A Multi-Layer Perceptron (MLP) model from scikit-learn trained to classify image grids into three main categories: Person, Animal, or Absent.
  • Robust Training & Evaluation: The training pipeline includes stratified data splitting, feature scaling, cross-validation, and hyperparameter tuning to ensure a generalized and effective model.
  • Grid-Based Prediction & Visualization: A powerful prediction script that takes a full-sized image, divides it into a grid, classifies each cell, and generates a color-coded visual overlay to show the detected regions.

Technology Stack

  • Core Language:

    • Python 3.x: Used for all data processing, machine learning, and GUI development.
  • Machine Learning & Data Processing:

    • Scikit-learn: The primary library for the MLP Classifier, train/test splitting, feature scaling (StandardScaler), and model evaluation metrics.
    • NumPy: For efficient numerical operations and manipulation of image arrays and feature vectors.
    • Pandas: Used for data manipulation and analysis, particularly for handling the feature dataset and performance metrics.
  • Image Processing & Feature Extraction:

    • OpenCV (cv2): For core image processing tasks like calculating color histograms and Hu Moments.
    • Scikit-image: For advanced feature extraction, including Gray Level Co-occurrence Matrix (GLCM) properties and Histogram of Oriented Gradients (HOG).
  • GUI & Visualization:

    • Tkinter: Used to build the custom GUI for manual data labeling.
    • Matplotlib: For plotting the model's learning curve, confusion matrix, and visualizing prediction results.

Project Workflow

The project follows a systematic, five-step workflow from dataset creation to final visualization.

1. Dataset Creation

The foundation of this project is a custom dataset of AI-generated images, designed to provide diverse and challenging urban scenarios.

  • Source: 200 high-resolution (1024x1024) PNG images generated with AI.
  • Versions:
    • V1 (100 images): Urban scenes containing both people and animals.
    • V2 (100 images): Urban scenes containing only animals (no people).
  • Folder Structure (Initial):
    dataset/
    ├── V1/
    │   ├── 1.png, 2.png, ...
    └── V2/
        └── 1.png, 2.png, ...
    

Dataset V1 Example   Dataset V2 Example

2. Manual Labeling Tool

To create a supervised learning dataset, a custom GUI tool was developed using Tkinter. This tool streamlines the process of classifying segments of the source images.

  • Functionality:
    • Loads a 1024x1024 source image.
    • Divides it into an 8x8 grid of 128x128 pixel cells.
    • Allows the user to manually label each grid cell with one of four classes: P (Person), A (Absent), N (Animal), or Noise.
    • Saves each labeled grid cell as a separate image file in a structured output directory.
    • Generates a CSV file logging the metadata for each grid.
  • Output Structure:
    dataset/V1/output/
    ├── A/ (Absent)
    │   └── grid_V1_1_1_A.png
    ├── N/ (Animal)
    ├── P/ (Person)
    └── Noise/
    

Labeling Tool UI

3. Comprehensive Feature Extraction

Each 128x128 image grid is processed to extract a feature vector that captures its essential visual characteristics. This is handled by the Image class in Image.py. The final feature vector is a concatenation of the following features:

Feature Description Justification for People Detection
RGB Statistics (Mean, Mode, Variance, SD) Basic statistical properties of the Red, Green, and Blue color channels. Captures overall color and texture information (e.g., skin tones, clothing colors, background variance).
Color Histogram The distribution of pixel intensities across all three color channels. Provides a detailed color profile, crucial for identifying objects with distinct color signatures.
Gray Level Co-occurrence Matrix (GLCM) Statistical features (contrast, dissimilarity, etc.) describing the textural patterns in the image. Critical for recognizing textures like fabric, hair, or skin that are independent of color.
Histogram of Oriented Gradients (HOG) Captures the distribution of gradient orientations, which describes local object shape and appearance. Highly effective for detecting human forms, as it is robust to variations in pose and lighting.
Peak Local Max Identifies local maxima (key points) in the image's intensity landscape. Helps detect key interest points which can be essential for feature matching and scene understanding.
Hu Moments A set of seven moments that are invariant to translation, scale, and rotation. Provides a unique signature for shapes, useful for identifying the overall form of objects like people.

4. Model Training & Evaluation

The extracted feature vectors are used to train a scikit-learn Multi-Layer Perceptron (MLP) classifier.

  • Model Architecture: A feedforward Artificial Neural Network (ANN) with 3 hidden layers, each containing 128 neurons.
  • Preprocessing:
    • The dataset is loaded from the V1/output and V2/output directories.
    • To address class imbalance, the majority class 'A' (Absent) is randomly undersampled.
    • Data is split into 80% for training and 20% for testing using a stratified split to preserve class proportions.
    • Features are standardized using StandardScaler to ensure the model trains effectively.
  • Training: The model is trained for 100 epochs using the 'Adam' optimizer with a learning rate of 0.001.
  • Evaluation: Performance is assessed using a classification report (precision, recall, F1-score), a confusion matrix, and 5-fold cross-validation to ensure the model's robustness.

Model Hyperparameters Report ConfusionMatrix LearningCurve AccuracyByHyperparameters PrecisionByClassAndHyperparameters

5. Prediction & Visualization

The trained model is used to make predictions on full, unseen images.

  • Process:
    1. A full-size (1024x1024) image is loaded.
    2. It is divided into the same 128x128 grid used for training.
    3. The feature vector for each grid cell is extracted.
    4. The trained MLP model predicts the class for each cell.
  • Visualization: The script generates a visual overlay where each grid cell is colored based on its predicted class, providing an intuitive heatmap of detected objects.
    • Blue: Person (P)
    • Green: Animal (N)
    • Red: Absent (A)

Getting Started

Follow these steps to set up the project environment and run the code.

Prerequisites

  • Python 3.8+
  • pip for package installation

Installation

  1. Clone the Repository
    git clone https://github.com/AlvaroVasquezAI/People_Detection.git
    cd People_Detection

Usage

1. Labeling New Data (labeling_tool.py)

Use this tool to create or expand the dataset by manually classifying 128x128 image grids.

  • Purpose: To generate the labeled image data required for training the model.
  • Instructions:
    1. Open the labeling_tool.py script.
    2. Modify the image_directory and output_directory variables at the bottom of the script to point to your folder of source images and the desired output location.
  • Run the Command:
    python labeling_tool.py

2. Training the Model (main.py)

This is the core script to train the Artificial Neural Network from scratch using the labeled data.

  • Purpose: To generate the model.pkl file that contains the trained classifier.

  • Instructions:

    • Important: To start a new training session, first delete the existing model.pkl file from the root directory.

    When you run the script, it will automatically perform the following steps:

    1. Load all labeled grid images from the dataset/V1/output and dataset/V2/output directories.
    2. Extract feature vectors for each image grid.
    3. Split the data, apply feature scaling, and train the MLP classifier.
    4. Perform cross-validation and print evaluation metrics to the console.
    5. Save the newly trained model as model.pkl.
    6. Run a prediction on the hardcoded example image (dataset/V1/1.png) and display the visual result using Matplotlib.
  • Run the Command:

    python main.py

3. Interactive Prediction with the Desktop UI (UI.py)

The project includes a graphical user interface (GUI) application, built with Tkinter, for making predictions on any image you choose. This provides a user-friendly way to test the model's performance visually without needing to modify any code.

  • Purpose: To use the pre-trained model for classifying new, unseen images.

  • Requirement: This application requires a pre-trained model.pkl file to be present in the root directory. If this file does not exist, you must run the training script (main.py) first to generate it.

  • How to Run: To launch the graphical interface, execute the following command from the project's root directory:

    python UI.py
  • Using the Application:

    1. A window titled "People Detection" will open.
    2. Click the "Select Image" button.
    3. A file dialog will appear, allowing you to browse and select any image file from your computer.
    4. After you select an image, the application will process it and display the results in four panels.
  • Understanding the Output: The UI will display four different visualizations of the result:

    • Original Image: Your selected input image, resized for display.
    • Predicted Overlay: A heatmap showing the classification for each 128x128 grid. The legend at the bottom explains the color coding:
      • Green: Person
      • Blue: Animal
      • Orange: Absent (No person or animal)
    • Fused Image: A semi-transparent blend of the original image and the prediction overlay, making it easy to see the detected regions in context.
    • Contoured Image: An advanced visualization that draws green contours around the detected "Person" regions, providing a clear outline of the classified objects.

Results

Below are some examples of the model's prediction output on images from the dataset and external sources. The colored overlay visualizes the class predicted for each 128x128 grid cell.


License

This project is licensed under the MIT License - see the LICENSE file for details.

About

People detection with a neural network (84.51% acc) to detect people in urban environments using custom-extracted image features. The NN is trained on a dataset consisting of images categorized into scenes with people and without people but with animals. The dataset was created by generative AI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors