Skip to content

RepnikovPavel/FurnitureDetection

Repository files navigation

Metrics

jacard index (intersection over union)

$$ JI =IOU= \frac{|A B|}{|A + B|} $$

jacard coefficient

$$ JC = \frac{|AB|}{|A|+|B|-|AB|} $$

detection precision and recall

$TP$: $IOU>\alpha$ and $c_{predicted}=c_{ground : truth}$
$FP$: $IOU<\alpha$ or $c_{predicted} \neq c_{ground : truth}$
$FN$: $IOU < \alpha$ - the algorithm predict a box either outside the ground truth box of the object, or did not predict a box at all
$TN$: the algorithm correctly did not pay attention to the specified area (box)

$$ precision = \frac{TP}{TP+FP} $$

$$ recall = \frac{TP}{TP+FN} $$

$$ F_{\beta} = (1+\beta^{2})\frac{precision \cdot recall}{(\beta^{2} \cdot precision)+recall} $$

$TP$ - number of hitting the targets $FN$ - number of skipping targets $FP$ - number of false alarms

detection metrics

mAP (mean Average Precision)

$$ mAP = \frac{1}{n} \sum_{i=1}^{n}{AP_{i}} $$

where $AP_{i}$ is the average precision for class $c_{i}$ and $n$ is the number of classes

For one class:
AP (Average Precision)

$$ AP = \sum_{i=1}^{N-1}{(R_{i+1}-R_{i})P_{i}} = \int_{0}^{1}{precision(recall)d(recall)} $$

$N$ - the number of predictions for this class

$$ scores = {s_{1},...,s_{N}}, s_{i} \geq s_{i+1} $$

$$ precision_{i}^{ \star} = precision^{\star}(bboxes[1:i],labels[1:i]) $$

$$ recall_{i}^{\star} = recall^{\star}(bboxes[1:i],labels[1:i]) $$

$$ precision(recall) = {(precision_{i}^{\star},recall_{i}^{\star}),i=\overline{1,N}} $$

The $precision^{\star}$ is then defined as the number of true positives divided by the number of all detected boxes and the $recall^{\star}$ is defined as the number of true positives divided by the number of all ground boxes.

general scheme

graph LR
title[<u>CollectingADataset</u>]
3DScene --- Blender
Blender --- ManualLabeling
3DScene --- UnrealEngine
UnrealEngine --- ManualLabeling
UnrealEngine --- AutomaticLabeling
RealImages --- ManualLabeling
AnyImage --- LabelingByML
Loading

graph LR
title[<u>Segmentation</u>]
Dataset --- RowImages
Dataset --- TrueSegmentationMasks
TrainPipeline --- TrainSegmentationModel
RowImages --- TrainSegmentationModel
TrueSegmentationMasks --- TrainSegmentationModel
Loading

graph LR
title[<u>Detection</u>]
Dataset --- RowImages
Dataset --- TrueBoundingBoxes,LabesOfClasses
RowImages --- TrainDetectionModel
TrainPipeline --- TrainDetectionModel
TrueBoundingBoxes,LabesOfClasses --- TrainDetectionModel
Loading

the scheme of solving the problem

graph TD

COCO2017 --> MakeCOCOFormatPipiline
MakeCOCOFormatPipiline --> COCOFormatPipeline 
COCOFormatPipeline --> TestAFewModels
TrainPipeline --> TestAFewModels -->Model
TrainPipeline --> TargetModel
TargetDataset --> TargetModel
RealImages --> AssessmentOfTheAbilityToGeneralize
TargetModel --> AssessmentOfTheAbilityToGeneralize
Loading

the scheme of solving the problem with transfer learning

graph TD

FurnitureImagesDataset --> ModelСlassifyingFurniture
ModelСlassifyingFurniture --> ClassificationHeadForDetectionModel --> TheSchemeOfSolvingTheProblem

Loading

solving the problem with automatic labeling

graph TD 
UnrealEngineOrUnity --> Real-TimeSubstitutionOfObjectTextures --> SegmentationMasks --> Dataset --> TrainSegmentationModel
Loading
graph TD
UnrealEngineOrUnity --> Get3dBoundingBoxed --> ProjectToTheCamera --> BBoxesOnImage --> TrainDetectionModel
Loading

automatic detection of overlapping objects

Alt text
Alt text

changing the image registration conditions

Alt text
Alt text

creating a 3d scene

graph LR 
blenderkit --> fbx --> UnityAsset --> AssetLoader --> SceneStateGenerator
Loading

COCO format for detection

Dataset:

{
    "info": {...},
    "licenses": [...],
    "images": [...],
    "annotations": [...],
    "categories": [...]
}

Components:

"info": {
    "description": "COCO 2017 Dataset",
    "url": "http://cocodataset.org",
    "version": "1.0",
    "year": 2017,
    "contributor": "COCO Consortium",
    "date_created": "2017/09/01"
}
"licenses": [
    {
        "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License"
    },
    {
        "url": "http://creativecommons.org/licenses/by-nc/2.0/",
        "id": 2,
        "name": "Attribution-NonCommercial License"
    },
    ...
]
"images": [
    {
        "license": 4,
        "file_name": "000000397133.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
        "height": 427,
        "width": 640,
        "date_captured": "2013-11-14 17:02:52",
        "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
        "id": 397133
    },
    {
        "license": 1,
        "file_name": "000000037777.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
        "height": 230,
        "width": 352,
        "date_captured": "2013-11-14 20:55:31",
        "flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
        "id": 37777
    },
    ...
]

"categories": [
    {"supercategory": "person","id": 1,"name": "person"},
    {"supercategory": "vehicle","id": 2,"name": "bicycle"},
    {"supercategory": "vehicle","id": 3,"name": "car"},
    {"supercategory": "vehicle","id": 4,"name": "motorcycle"},
    {"supercategory": "vehicle","id": 5,"name": "airplane"},
    ...
    {"supercategory": "indoor","id": 89,"name": "hair drier"},
    {"supercategory": "indoor","id": 90,"name": "toothbrush"}
]

"annotations": [
    {
        "segmentation": [[510.66,423.01,511.72,420.03,...,510.45,423.01]],
        "area": 702.1057499999998,
        "iscrowd": 0,
        "image_id": 289343,
        "bbox": [473.07,395.93,38.65,28.67],
        "category_id": 18,
        "id": 1768
    },
    ...
    {
        "segmentation": {
            "counts": [179,27,392,41,…,55,20],
            "size": [426,640]
        },
        "area": 220834,
        "iscrowd": 1,
        "image_id": 250282,
        "bbox": [0,34,639,388],
        "category_id": 1,
        "id": 900100250282
    }
]

Results

link to dataset

class distribution

Alt text

Values of the loss function during gradient descent

$$ L = \alpha L_{classification} + \beta L_{regression} ,\alpha =1,\beta=1 $$

ep - epoch index

Alt text

post processing

graph LR
image --> model --> MulticlassNMS --> NMS  
Loading

model predictins on train data

predictions

mAP(train dataset)

model: SSD300_VGG16

Alt text

map 0.5197 global mean average precision
map_small 0.6083 mean average precision for small objects
map_medium 0.5975 mean average precision for medium objects
map_large 0.6817 mean average precision for large objects
mar_1 0.4407 mean average recall for 1 detection per image
mar_10 -1.0 mean average recall for 10 detections per image
mar_100 0.2553 mean average recall for 100 detections per image
mar_small 0.4952 mean average recall for small objects
mar_medium 0.5443 mean average recall for medium objects
mar_large 0.5443 mean average recall for large objects
map_50 -1.0 (-1 if 0.5 not in the list of iou thresholds), mean average precision at IoU=0.50
map_75 0.7025 (-1 if 0.75 not in the list of iou thresholds), mean average precision at IoU=0.75
map_per_class 0.4645 (-1 if class metrics are disabled), mean average precision per observed class
mar_100_per_class 0.2914 (-1 if class metrics are disabled), mean average recall for 100 detections per image per observed class

problems of transferring a model trained on synthetic data

insufficient noise of the training data. when collecting synthetic data, it is necessary to photograph objects from all possible distances. it is necessary to artificially include artifacts in the training sample, for example, text, small objects, images of other classes, etc.

instructions for reproducing the result

  1. unzip DATASET.zip
  2. setup conf.py file
  3. run data_manip.py file
  4. run TRAIN_ssd300_vgg16.py file
  5. run TEST_ssd300_vgg16.py file

list of links

  1. the best introductory lecture
  2. object detection tutorial on github
  3. COCO dataset overview
  4. MIPT Computer Vision
  5. ssd300 article
  6. COCO format viewer
  7. unreal engine segmentation dataset
  8. scalable object detection using deep neural networks (2013)
  9. 3d person camera in unity
  10. segmentation mask in utiny

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published