Skip to content

This project is a PyTorch implementation that uses deep CNN to recognize multi-digit numbers using the SVHN dataset derived from Google Street View house numbers, each picture contains a set of numbers from 0 to 9, the model is tested to have 89% accuracy.| 使用深度卷积神经网络从街景图像中识别多位数门牌号的PyTorch实现方案,使用的数据集为SVHN,来源于谷歌街景门牌号码,每张图片中包含一组0-9的阿拉伯数字,经测试精确度可达89%

License

Notifications You must be signed in to change notification settings

Genius-Society/svhn_recognition

Repository files navigation

SVHN Recognition

license hf ms

This project is a PyTorch implementation that uses deep CNN to recognize multi-digit numbers using the SVHN dataset derived from Google Street View house numbers (SVHN), each picture contains a set of numbers from 0 to 9, the model is tested to have 89% accuracy.

Original dataset

Street View House Number Dateset, sourced from Google Street View house numbers, is provided in Format 1 (Full Numbers), which includes three compressed files: train.tar.gz, test.tar.gz, and extra.tar.gz. Here, train.tar.gz constitutes the training dataset, while test.tar.gz serves as the testing dataset. It is important to note that extra.tar.gz is an additional dataset that is not recommended for use. Within both train.tar.gz and test.tar.gz, the following components are included:

  1. A collection of PNG images, each depicting a house number.
  2. A file named digitStruct.mat, which contains the house number corresponding to each image along with the positional information for each individual digit.
  3. A file named see_bboxes.m, provided solely as an auxiliary tool for processing within the Matlab environment, which can be disregarded.

Tasks

Network Design and Training

Develop a neural network that is trained using the data provided in train.tar.gz and subsequently evaluated on the data contained in test.tar.gz.

Testing Constraints

During the testing phase, the positional information contained within the test.tar.gz/digitStruct.mat file must not be used as input. The network must be capable of accurately recognizing the house number in each test image without relying on the provided positional metadata.

Report Submission

Prepare a comprehensive report that includes:

  • A detailed description of the network architecture and the hyperparameters employed.
  • An explanation of the training methodology and the optimization techniques used.
  • Training curves that illustrate the progression of the training process.
  • The recognition accuracy achieved on the testing dataset.

This formulation is intended to guide the development and evaluation of a system for automated recognition of house numbers using the SVHN dataset under the specified constraints.

Report

The report docs are here.

Environment

conda create -n py311 python=3.11 -y
conda activate py311
pip install -r requirements.txt

Usage

  1. Clone the source code:
git clone [email protected]:Genius-Society/svhn_recognition.git
cd svhn_recognition
  1. Run python train.py

Params

Steps GPU Batch size Learning rate Patience Decay step Decay rate Accuracy
122000 GTX 1080 Ti 512 0.01 100 625 0.9 89.21%

Training curve

Reference

About

This project is a PyTorch implementation that uses deep CNN to recognize multi-digit numbers using the SVHN dataset derived from Google Street View house numbers, each picture contains a set of numbers from 0 to 9, the model is tested to have 89% accuracy.| 使用深度卷积神经网络从街景图像中识别多位数门牌号的PyTorch实现方案,使用的数据集为SVHN,来源于谷歌街景门牌号码,每张图片中包含一组0-9的阿拉伯数字,经测试精确度可达89%

Topics

Resources

License

Stars

Watchers

Forks

Languages