An end-to-end computer vision project that uses classical image feature extraction and an Artificial Neural Network (ANN) to detect the presence of people in urban environments.
In the realm of computer vision, detecting human figures within complex environments is a pivotal challenge. This project presents a comprehensive implementation of an Artificial Neural Network (ANN) designed specifically for detecting people within urban scenes—a critical application in areas like surveillance, autonomous driving, and pedestrian tracking.
The core of this project lies in the development of a sophisticated feature extraction pipeline and the training of a neural network model that can accurately classify image segments. By leveraging a custom dataset of AI-generated urban scenarios, we refine the model's ability to generalize across different settings and conditions. This involves a detailed workflow from manual data labeling and advanced image processing to model training, hyperparameter tuning, and robust validation.
This repository documents the entire process, including the custom-built data labeling tool, the feature extraction logic, the model training methodology, and a powerful visualization script that demonstrates the model's classification performance on full-scale images.
- Custom Data Labeling Tool: A graphical user interface (GUI) built with Tkinter to facilitate the rapid, manual labeling of image grids. The tool automatically segments large images, saves classified grids into structured folders, and generates a corresponding CSV log.
- Comprehensive Feature Extraction: A robust pipeline that extracts a rich set of features from each image grid to create a powerful feature vector for the machine learning model.
- ANN for Classification: A Multi-Layer Perceptron (MLP) model from
scikit-learntrained to classify image grids into three main categories: Person, Animal, or Absent. - Robust Training & Evaluation: The training pipeline includes stratified data splitting, feature scaling, cross-validation, and hyperparameter tuning to ensure a generalized and effective model.
- Grid-Based Prediction & Visualization: A powerful prediction script that takes a full-sized image, divides it into a grid, classifies each cell, and generates a color-coded visual overlay to show the detected regions.
-
Core Language:
- Python 3.x: Used for all data processing, machine learning, and GUI development.
-
Machine Learning & Data Processing:
- Scikit-learn: The primary library for the MLP Classifier, train/test splitting, feature scaling (
StandardScaler), and model evaluation metrics. - NumPy: For efficient numerical operations and manipulation of image arrays and feature vectors.
- Pandas: Used for data manipulation and analysis, particularly for handling the feature dataset and performance metrics.
- Scikit-learn: The primary library for the MLP Classifier, train/test splitting, feature scaling (
-
Image Processing & Feature Extraction:
- OpenCV (
cv2): For core image processing tasks like calculating color histograms and Hu Moments. - Scikit-image: For advanced feature extraction, including Gray Level Co-occurrence Matrix (GLCM) properties and Histogram of Oriented Gradients (HOG).
- OpenCV (
-
GUI & Visualization:
- Tkinter: Used to build the custom GUI for manual data labeling.
- Matplotlib: For plotting the model's learning curve, confusion matrix, and visualizing prediction results.
The project follows a systematic, five-step workflow from dataset creation to final visualization.
The foundation of this project is a custom dataset of AI-generated images, designed to provide diverse and challenging urban scenarios.
- Source: 200 high-resolution (1024x1024) PNG images generated with AI.
- Versions:
- V1 (100 images): Urban scenes containing both people and animals.
- V2 (100 images): Urban scenes containing only animals (no people).
- Folder Structure (Initial):
dataset/ ├── V1/ │ ├── 1.png, 2.png, ... └── V2/ └── 1.png, 2.png, ...
To create a supervised learning dataset, a custom GUI tool was developed using Tkinter. This tool streamlines the process of classifying segments of the source images.
- Functionality:
- Loads a 1024x1024 source image.
- Divides it into an 8x8 grid of 128x128 pixel cells.
- Allows the user to manually label each grid cell with one of four classes: P (Person), A (Absent), N (Animal), or Noise.
- Saves each labeled grid cell as a separate image file in a structured output directory.
- Generates a CSV file logging the metadata for each grid.
- Output Structure:
dataset/V1/output/ ├── A/ (Absent) │ └── grid_V1_1_1_A.png ├── N/ (Animal) ├── P/ (Person) └── Noise/
Each 128x128 image grid is processed to extract a feature vector that captures its essential visual characteristics. This is handled by the Image class in Image.py. The final feature vector is a concatenation of the following features:
| Feature | Description | Justification for People Detection |
|---|---|---|
| RGB Statistics (Mean, Mode, Variance, SD) | Basic statistical properties of the Red, Green, and Blue color channels. | Captures overall color and texture information (e.g., skin tones, clothing colors, background variance). |
| Color Histogram | The distribution of pixel intensities across all three color channels. | Provides a detailed color profile, crucial for identifying objects with distinct color signatures. |
| Gray Level Co-occurrence Matrix (GLCM) | Statistical features (contrast, dissimilarity, etc.) describing the textural patterns in the image. | Critical for recognizing textures like fabric, hair, or skin that are independent of color. |
| Histogram of Oriented Gradients (HOG) | Captures the distribution of gradient orientations, which describes local object shape and appearance. | Highly effective for detecting human forms, as it is robust to variations in pose and lighting. |
| Peak Local Max | Identifies local maxima (key points) in the image's intensity landscape. | Helps detect key interest points which can be essential for feature matching and scene understanding. |
| Hu Moments | A set of seven moments that are invariant to translation, scale, and rotation. | Provides a unique signature for shapes, useful for identifying the overall form of objects like people. |
The extracted feature vectors are used to train a scikit-learn Multi-Layer Perceptron (MLP) classifier.
- Model Architecture: A feedforward Artificial Neural Network (ANN) with 3 hidden layers, each containing 128 neurons.
- Preprocessing:
- The dataset is loaded from the
V1/outputandV2/outputdirectories. - To address class imbalance, the majority class 'A' (Absent) is randomly undersampled.
- Data is split into 80% for training and 20% for testing using a stratified split to preserve class proportions.
- Features are standardized using
StandardScalerto ensure the model trains effectively.
- The dataset is loaded from the
- Training: The model is trained for 100 epochs using the 'Adam' optimizer with a learning rate of 0.001.
- Evaluation: Performance is assessed using a classification report (precision, recall, F1-score), a confusion matrix, and 5-fold cross-validation to ensure the model's robustness.
The trained model is used to make predictions on full, unseen images.
- Process:
- A full-size (1024x1024) image is loaded.
- It is divided into the same 128x128 grid used for training.
- The feature vector for each grid cell is extracted.
- The trained MLP model predicts the class for each cell.
- Visualization: The script generates a visual overlay where each grid cell is colored based on its predicted class, providing an intuitive heatmap of detected objects.
- Blue: Person (P)
- Green: Animal (N)
- Red: Absent (A)
Follow these steps to set up the project environment and run the code.
- Python 3.8+
pipfor package installation
- Clone the Repository
git clone https://github.com/AlvaroVasquezAI/People_Detection.git cd People_Detection
Use this tool to create or expand the dataset by manually classifying 128x128 image grids.
- Purpose: To generate the labeled image data required for training the model.
- Instructions:
- Open the
labeling_tool.pyscript. - Modify the
image_directoryandoutput_directoryvariables at the bottom of the script to point to your folder of source images and the desired output location.
- Open the
- Run the Command:
python labeling_tool.py
This is the core script to train the Artificial Neural Network from scratch using the labeled data.
-
Purpose: To generate the
model.pklfile that contains the trained classifier. -
Instructions:
- Important: To start a new training session, first delete the existing
model.pklfile from the root directory.
When you run the script, it will automatically perform the following steps:
- Load all labeled grid images from the
dataset/V1/outputanddataset/V2/outputdirectories. - Extract feature vectors for each image grid.
- Split the data, apply feature scaling, and train the MLP classifier.
- Perform cross-validation and print evaluation metrics to the console.
- Save the newly trained model as
model.pkl. - Run a prediction on the hardcoded example image (
dataset/V1/1.png) and display the visual result using Matplotlib.
- Important: To start a new training session, first delete the existing
-
Run the Command:
python main.py
The project includes a graphical user interface (GUI) application, built with Tkinter, for making predictions on any image you choose. This provides a user-friendly way to test the model's performance visually without needing to modify any code.
-
Purpose: To use the pre-trained model for classifying new, unseen images.
-
Requirement: This application requires a pre-trained
model.pklfile to be present in the root directory. If this file does not exist, you must run the training script (main.py) first to generate it. -
How to Run: To launch the graphical interface, execute the following command from the project's root directory:
python UI.py
-
Using the Application:
- A window titled "People Detection" will open.
- Click the "Select Image" button.
- A file dialog will appear, allowing you to browse and select any image file from your computer.
- After you select an image, the application will process it and display the results in four panels.
-
Understanding the Output: The UI will display four different visualizations of the result:
- Original Image: Your selected input image, resized for display.
- Predicted Overlay: A heatmap showing the classification for each 128x128 grid. The legend at the bottom explains the color coding:
- Green: Person
- Blue: Animal
- Orange: Absent (No person or animal)
- Fused Image: A semi-transparent blend of the original image and the prediction overlay, making it easy to see the detected regions in context.
- Contoured Image: An advanced visualization that draws green contours around the detected "Person" regions, providing a clear outline of the classified objects.
Below are some examples of the model's prediction output on images from the dataset and external sources. The colored overlay visualizes the class predicted for each 128x128 grid cell.
This project is licensed under the MIT License - see the LICENSE file for details.












