jacard index (intersection over union)
jacard coefficient
detection precision and recall
detection metrics
mAP (mean Average Precision)
where
For one class:
AP (Average Precision)
The
graph LR
title[<u>CollectingADataset</u>]
3DScene --- Blender
Blender --- ManualLabeling
3DScene --- UnrealEngine
UnrealEngine --- ManualLabeling
UnrealEngine --- AutomaticLabeling
RealImages --- ManualLabeling
AnyImage --- LabelingByML
graph LR
title[<u>Segmentation</u>]
Dataset --- RowImages
Dataset --- TrueSegmentationMasks
TrainPipeline --- TrainSegmentationModel
RowImages --- TrainSegmentationModel
TrueSegmentationMasks --- TrainSegmentationModel
graph LR
title[<u>Detection</u>]
Dataset --- RowImages
Dataset --- TrueBoundingBoxes,LabesOfClasses
RowImages --- TrainDetectionModel
TrainPipeline --- TrainDetectionModel
TrueBoundingBoxes,LabesOfClasses --- TrainDetectionModel
graph TD
COCO2017 --> MakeCOCOFormatPipiline
MakeCOCOFormatPipiline --> COCOFormatPipeline
COCOFormatPipeline --> TestAFewModels
TrainPipeline --> TestAFewModels -->Model
TrainPipeline --> TargetModel
TargetDataset --> TargetModel
RealImages --> AssessmentOfTheAbilityToGeneralize
TargetModel --> AssessmentOfTheAbilityToGeneralize
graph TD
FurnitureImagesDataset --> ModelСlassifyingFurniture
ModelСlassifyingFurniture --> ClassificationHeadForDetectionModel --> TheSchemeOfSolvingTheProblem
graph TD
UnrealEngineOrUnity --> Real-TimeSubstitutionOfObjectTextures --> SegmentationMasks --> Dataset --> TrainSegmentationModel
graph TD
UnrealEngineOrUnity --> Get3dBoundingBoxed --> ProjectToTheCamera --> BBoxesOnImage --> TrainDetectionModel
graph LR
blenderkit --> fbx --> UnityAsset --> AssetLoader --> SceneStateGenerator
Dataset:
{
"info": {...},
"licenses": [...],
"images": [...],
"annotations": [...],
"categories": [...]
}Components:
"info": {
"description": "COCO 2017 Dataset",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"date_created": "2017/09/01"
}
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
},
{
"url": "http://creativecommons.org/licenses/by-nc/2.0/",
"id": 2,
"name": "Attribution-NonCommercial License"
},
...
]
"images": [
{
"license": 4,
"file_name": "000000397133.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
"height": 427,
"width": 640,
"date_captured": "2013-11-14 17:02:52",
"flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
"id": 397133
},
{
"license": 1,
"file_name": "000000037777.jpg",
"coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg",
"height": 230,
"width": 352,
"date_captured": "2013-11-14 20:55:31",
"flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg",
"id": 37777
},
...
]
"categories": [
{"supercategory": "person","id": 1,"name": "person"},
{"supercategory": "vehicle","id": 2,"name": "bicycle"},
{"supercategory": "vehicle","id": 3,"name": "car"},
{"supercategory": "vehicle","id": 4,"name": "motorcycle"},
{"supercategory": "vehicle","id": 5,"name": "airplane"},
...
{"supercategory": "indoor","id": 89,"name": "hair drier"},
{"supercategory": "indoor","id": 90,"name": "toothbrush"}
]
"annotations": [
{
"segmentation": [[510.66,423.01,511.72,420.03,...,510.45,423.01]],
"area": 702.1057499999998,
"iscrowd": 0,
"image_id": 289343,
"bbox": [473.07,395.93,38.65,28.67],
"category_id": 18,
"id": 1768
},
...
{
"segmentation": {
"counts": [179,27,392,41,…,55,20],
"size": [426,640]
},
"area": 220834,
"iscrowd": 1,
"image_id": 250282,
"bbox": [0,34,639,388],
"category_id": 1,
"id": 900100250282
}
]
ep - epoch index
graph LR
image --> model --> MulticlassNMS --> NMS
model: SSD300_VGG16
| map | 0.5197 | global mean average precision |
|---|---|---|
| map_small | 0.6083 | mean average precision for small objects |
| map_medium | 0.5975 | mean average precision for medium objects |
| map_large | 0.6817 | mean average precision for large objects |
| mar_1 | 0.4407 | mean average recall for 1 detection per image |
| mar_10 | -1.0 | mean average recall for 10 detections per image |
| mar_100 | 0.2553 | mean average recall for 100 detections per image |
| mar_small | 0.4952 | mean average recall for small objects |
| mar_medium | 0.5443 | mean average recall for medium objects |
| mar_large | 0.5443 | mean average recall for large objects |
| map_50 | -1.0 | (-1 if 0.5 not in the list of iou thresholds), mean average precision at IoU=0.50 |
| map_75 | 0.7025 | (-1 if 0.75 not in the list of iou thresholds), mean average precision at IoU=0.75 |
| map_per_class | 0.4645 | (-1 if class metrics are disabled), mean average precision per observed class |
| mar_100_per_class | 0.2914 | (-1 if class metrics are disabled), mean average recall for 100 detections per image per observed class |
insufficient noise of the training data. when collecting synthetic data, it is necessary to photograph objects from all possible distances. it is necessary to artificially include artifacts in the training sample, for example, text, small objects, images of other classes, etc.
- unzip DATASET.zip
- setup conf.py file
- run data_manip.py file
- run TRAIN_ssd300_vgg16.py file
- run TEST_ssd300_vgg16.py file