Feature Extraction

Catalogue

1.Introduction
2.Network Structure
3.General Recognition Models
4.Customized Feature Extraction

1. Abstract

Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent vector search. Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). Deep Metric Learning is applied to explore how to obtain features with high representational power through deep learning.

2. Introduction

In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:

Functions of the above modules :

Backbone: Specifies the backbone network to be used. It is worth noting that the ImageNet-based pre-training model provided by PaddleClas has an output of 1000 for the last layer, which demands for customization according to the required feature dimensions.
Neck: Used for feature augmentation and feature dimension transformation. Here it can be a simple Linear Layer for feature dimension transformation, or a more complex FPN structure for feature augmentation.
Head: Used to transform features into logits. In addition to the common Fc Layer, cosmargin, arcmargin, circlemargin and other modules are all available choices.
Loss: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.

3. Methods

3.1 Backbone

The Backbone part adopts PP-LCNetV2_base, which is based on PPLCNet_V1, including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement After several optimization points, the final classification accuracy is similar to PPLCNet_x2_5, and the inference delay is reduced by 40%^*. During the experiment, we made appropriate improvements to PPLCNetV2_base, so that it can achieve higher performance in recognition tasks while keeping the speed basically unchanged, including: removing ReLU and at the end ofPPLCNetV2_base FC, change the stride of the last stage (RepDepthwiseSeparable) to 1.

Note: ^*The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.

3.2 Neck

We use BN Neck to standardize each dimension of the features extracted by Backbone, reducing difficulty of optimizing metric learning loss and identification loss simultaneously.

3.3 Head

We use FC Layer as the classification head to convert features into logits for classification loss.

3.4 Loss

We use Cross entropy loss and TripletAngularMarginLoss, and we improved the original TripletLoss(TriHard Loss), replacing the optimization objective from L2 Euclidean space to cosine space, adding a hard distance constraint between anchor and positive/negtive, so the generalization ability of the model is improved. For detailed configuration files, see GeneralRecognitionV2_PPLCNetV2_base.yaml.

3.5 Data Augmentation

We consider that the object may rotate to a certain extent and can not maintain an upright state in real scenes, so we add an appropriate random rotation in the data augmentation to improve the retrieval performance in real scenes.

4. Experimental

We reasonably expanded and optimized the original training data, and finally used a summary of the following 17 public datasets:

Dataset	Data Amount	Number of Categories	Scenario	Dataset Address
Aliproduct	2498771	50030	Commodities	Address
GLDv2	1580470	81313	Landmark	address
VeRI-Wild	277797	30671	Vehicles	Address
LogoDet-3K	155427	3000	Logo	Address
SOP	59551	11318	Commodities	Address
Inshop	25882	3997	Commodities	Address
bird400	58388	400	birds	address
104flows	12753	104	Flowers	Address
Cars	58315	112	Vehicles	Address
Fashion Product Images	44441	47	Products	Address
flowerrecognition	24123	59	flower	address
food-101	101000	101	food	address
fruits-262	225639	262	fruits	address
inaturalist	265213	1010	natural	address
indoor-scenes	15588	67	indoor	address
Products-10k	141931	9691	Products	Address
CompCars	16016	431	Vehicles	Address
Total	6M	192K	-	-

The final model accuracy metrics are shown in the following table:

Model	Latency (ms)	Storage (MB)	product^*		Aliproduct		VeRI-Wild		LogoDet-3k		iCartoonFace		SOP		Inshop		gldv2		imdb_face		iNat		instre		sketch		sop
			recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mrecall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP	recall@1	mAP
PP-ShiTuV1_general_rec	5.0	34	65.9	54.3	83.9	83.2	88.7	60.1	86.1	73.6		50.4	27.9	9.5	97.6	90.3
PP-ShiTuV2_general_rec	6.1	19	73.7	61.0	84.2	83.3	87.8	68.8	88.0	63.2	53.6	27.5		71.4	39.3	15.6	98.3	90.9

*The product dataset is a dataset made to verify the generalization performance of PP-ShiTu, and all the data are not present in the training and testing sets. The data contains 7 major categories (cosmetics, landmarks, wine, watches, cars, sports shoes, beverages) and 250 subcategories. When testing, use the labels of 250 small classes for testing; the sop dataset comes from GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval, which can be regarded as " SOP" dataset.

Pre-trained model address: general_PPLCNetV2_base_pretrained_v1.0.pdparams
The evaluation metrics used are: Recall@1 and mAP
The CPU specific information of the speed test machine is: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
The evaluation conditions of the speed indicator are: MKLDNN is turned on, and the number of threads is set to 10

5. Custom Feature Extraction

Custom feature extraction refers to retraining the feature extraction model according to your own task.

Based on the GeneralRecognitionV2_PPLCNetV2_base.yaml configuration file, the following describes the main four steps: 1) data preparation; 2) model training; 3) model evaluation; 4) model inference

5.1 Data Preparation

First you need to customize your own dataset based on the task. Please refer to Dataset Format Description for the dataset format and file structure.

After the preparation is complete, it is necessary to modify the content related to the data configuration in the configuration file, mainly including the path of the dataset and the number of categories. As is as shown below:

Modify the number of classes:

Head:
  name: FC
  embedding_size: *feat_dim
  class_num: 192612 # This is the number of classes
  weight_attr:
    initializer:
      name: Normal
      std: 0.001
  bias_attr: False

Modify the training dataset configuration:

Train:
  dataset:
    name: ImageNetDataset
    image_root: ./dataset/ # Here is the directory where the train dataset is located
    cls_label_path: ./dataset/train_reg_all_data_v2.txt # Here is the path of the label file corresponding to the train dataset
    relabel: True

Modify the query data configuration in the evaluation dataset:

Query:
  dataset:
    name: VeriWild
    image_root: ./dataset/Aliproduct/ # Here is the directory where the query dataset is located
    cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the query dataset

Modify the gallery data configuration in the evaluation dataset:

Gallery:
  dataset:
    name: VeriWild
    image_root: ./dataset/Aliproduct/ # This is the directory where the gallery dataset is located
    cls_label_path: ./dataset/Aliproduct/val_list.txt # Here is the path of the label file corresponding to the gallery dataset

5.2 Model training

Model training mainly includes the starting training and restoring training from checkpoint

Single machine and single card training

export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml

Single machine multi-card training

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml

Notice: The online evaluation method is used by default in the configuration file. If you want to speed up the training, you can turn off the online evaluation function, just add -o Global.eval_during_train=False after the above scripts.

After training, the final model files latest.pdparams, best_model.pdarams and the training log file train.log will be generated in the output directory. Among them, best_model saves the best model under the current evaluation index, and latest is used to save the latest generated model, which is convenient to resume training from the checkpoint when training task is interrupted. Training can be resumed from a checkpoint by adding -o Global.checkpoints="path_to_resume_checkpoint" to the end of the above training scripts, as shown below.

Single machine and single card checkpoint recovery training

export CUDA_VISIBLE_DEVICES=0
python3.7 tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoints="output/RecModel/latest"

Single-machine multi-card checkpoint recovery training

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.checkpoints="output/RecModel/latest"

5.3 Model Evaluation

In addition to the online evaluation of the model during training, the evaluation program can also be started manually to obtain the specified model's accuracy metrics.

Single Card Evaluation

export CUDA_VISIBLE_DEVICES=0
python3.7 tools/eval.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model"

Multi Card Evaluation

export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch --gpus="0,1,2,3" \
tools/eval.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model"

Note: Multi Card Evaluation is recommended. This method can quickly obtain the metric cross all the data by using multi-card parallel computing, which can speed up the evaluation.

5.4 Model Inference

The inference process consists of two steps: 1) Export the inference model; 2) Model inference to obtain feature vectors

5.4.1 Export inference model

First, you need to convert the *.pdparams model file into inference format. The conversion script is as follows.

python3.7 tools/export_model.py \
-c ./ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml \
-o Global.pretrained_model="output/RecModel/best_model"

The generated inference model is located in the PaddleClas/inference directory by default, which contains three files, inference.pdmodel, inference.pdiparams, inference.pdiparams.info. Where inference.pdmodel is used to store the structure of the inference model, inference.pdiparams and inference.pdiparams.info are used to store parameter information related to the inference model.

5.4.2 Get feature vector

Use the inference model converted in the previous step to convert the input image into corresponding feature vector. The inference script is as follows.

cd deploy
python3.7 python/predict_rec.py \
-c configs/inference_rec.yaml \
-o Global.rec_inference_model_dir="../inference"

The resulting feature output format is as follows:

wangzai.jpg: [-7.82453567e-02 2.55877394e-02 -3.66694555e-02 1.34572461e-02
  4.39076796e-02 -2.34078392e-02 -9.49947070e-03 1.28221214e-02
  5.53947650e-02 1.01355985e-02 -1.06436480e-02 4.97181974e-02
 -2.21862812e-02 -1.75557341e-02 1.55848479e-02 -3.33278324e-03
 ...
 -3.40284109e-02 8.35561901e-02 2.10910216e-02 -3.27066667e-02]

In most cases, just getting the features may not meet the users' requirements. If you want to go further on the image recognition task, you can refer to the document Vector Search.

6. Summary

As a key part of image recognition, the feature extraction module has a lot of points for improvement in the network structure and the the loss function. Different datasets have their own characteristics, such as person re-identification, commodity recognition, face recognition. According to these characteristics, the academic community has proposed various methods, such as PCB, MGN, ArcFace, CircleLoss, TripletLoss, etc., which focus on the ultimate goal of increasing the gap between classes and reducing the gap within classes, so as to make a retrieval model robust enough in most scenes.

7. References

PP-LCNet: A Lightweight CPU Convolutional Neural Network
Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature_extraction_en.md

feature_extraction_en.md

Feature Extraction

Catalogue

1. Abstract

2. Introduction

3. Methods

3.1 Backbone

3.2 Neck

3.3 Head

3.4 Loss

3.5 Data Augmentation

4. Experimental

5. Custom Feature Extraction

5.1 Data Preparation

5.2 Model training

5.3 Model Evaluation

5.4 Model Inference

5.4.1 Export inference model

5.4.2 Get feature vector

6. Summary

7. References

Files

feature_extraction_en.md

Latest commit

History

feature_extraction_en.md

File metadata and controls

Feature Extraction

Catalogue

1. Abstract

2. Introduction

3. Methods

3.1 Backbone

3.2 Neck

3.3 Head

3.4 Loss

3.5 Data Augmentation

4. Experimental

5. Custom Feature Extraction

5.1 Data Preparation

5.2 Model training

5.3 Model Evaluation

5.4 Model Inference

5.4.1 Export inference model

5.4.2 Get feature vector

6. Summary

7. References