Code release for the paper:
The study of unsupervised domain adaptation for object detection using spatial attention pyramid networks
python 3.8.11
pytorch 1.9.0 (conda)
torchvision 0.10.0 (conda)
detectron2 0.5+cu111 (pip)
tensorboard 2.6.0 (pip)
opencv-python (pip)
pycocotools (pip)
The directory structure should be as follows:
SAPNetV2/
├── configs/ # configuration files
├── datasets/ # datasets
├── detections/ # main implementation
| ├── da_heads/ # domain classfier, MS-CAM
| ├── data/ # data registeration
| ├── evaluation/ # poscal voc metric
| ├── layers/ # intermediate neural layer
| ├── meta_arch/ # faster rcnn
| ├── modeling/ # rpn, MEDM Loss
| ├── hook.py # evaluation extension
| └── trainer.py # supervised and unsupervised domain adaptation trainer, grad-cam
|
├── tools/ # train object detector
| └── train_net.py
├── pretrained # pretrained models
├── outputs/ # training logs
├── test_images/ # inference outputs
└── scripts/ # useful scripts
- make your dataset format to voc or coco, or other format that detectron2 supports
- register your dataset at detection/data/register.py
- test set format must be VOC becuase we use VOC metric to evaluate result
# VOC format
dataset_dir = $YOUR_DATASET_ROOT
classes = ('person', 'two-wheels', 'four-wheels') # dataset classes
years = 2007
# to find image list at $YOUR_DATASET_ROOT/ImaegSets/Main/{$split}.txt, only "train", "test", "val", "trainval"
split = 'train'
# call your dataset by this
meta_name = 'itri-taiwan-416_{}'.format(split)
# call register_pascal_voc to register
register_pascal_voc(meta_name, dataset_dir, split, years, classes)
Full configuration explanation is here, hyper parameter we can tuned:
MODEL.DA_HEAD.WINDOW_SIZES
andMODEL.DA_HEAD.WINDOW_STRIDES
, both are parameter of spatial pooling in domain classfierMODEL.DA_HEAD.LOSS_WEIGHT
, domain classifier loss weight, [0, 1.0]MODEL.DA_HEAD.TARGET_ENT_LOSS_WEIGHT
, entropy loss weight, [0, 1.0]MODEL.DA_HEAD.TARGET_DIV_LOSS_WEIGHT
, diversity loss weight, [-1.0, 0]
source domain images are target-like source domain images which are generated by Cycle GAN.
Scenarios | Domain Classifier Loss Weight | Entropy Loss Weight | Diversity Loss Weight | mAP |
---|---|---|---|---|
cityscapes -> foggy cityscapes | 0.6 | 1.0 | -0.2 | 49.63 |
sim10k -> cityscapes | 0.1 | 0.8 | -0.3 | 49.48 |
kitti -> cityscapes | 0.1 | 1.0 | -0.05 | 45.79 |
- you can check these training logs in the
pretrained/*-best/
usingtensorboard --logdir $LOG_PATH
pretrained/*-baseline
store the model pretrained with source domain
- Train a normal faster rcnn using source domain data (source domain only)
- Train whole model including domain calssfier using baseline model weight
- scripts stores sample scripts for three domain adaptation scenarios
basic usage
- train a model
python tools/train_net.py --config-file $CONFIG_FILE_PATH --num-gpus 1
- test the model
# update MODEL.WEIGHTS by command line
python tools/train_net.py --config-file $CONFIG_FILE_PATH --num-gpus 1 --eval-only MODEL.WEIGHTS $MODEL_WEIGHT_PATH
- predict boxes on the test set
python tools/train_net.py --config-file $CONFIG_FILE_PATH --num-gpus 1 --test-images MODEL.WEIGHTS $MODEL_WEIGHT_PATH MODEL.ROI_HEADS.SCORE_THRESH_TEST 0.75
Please use these scripts at SAPNetV2/
# or sh scripts/../....sh
..../SAPNetV2$ source scripts/../....sh
scripts/*/train_source_only.sh
is to train object detector using source domain data only, output directory is underoutputs/
scripts/*/train_sapv2.sh
is to train object detector with domain adaptation, output directory is underoutputs/
scripts/*/test_img.sh
is to predict boxes on the test set, output directory is undertest_images/
scripts/*/eval.sh
is to evaluate object detector with and without domain adaptation on the test setscripts/*/attention_mask.sh
is to visualize spatial attention mask of domain classfier, output directory is undertest_images/
scripts/*/grad_cam_domain_calssifier.sh
is to visualize class activation map of domain classifier using grad cam, output directory is undertest_images/
scripts/*/grad_cam_object_detection.sh
is to visualize class activation map of object detector using grad cam, output directory is undertest_images/
Configuration files for ablation are under configs/*/ablation/
scripts/*/cyclegan.sh
is to train baseline method using souce domain images with target domain style generated by cycle ganscripts/*/medm.sh
is to train baseline method with MEDM loss which is implemented in herescripts/*/mscam.sh
is to train baseline method with MS-CAM which is implemented in here
- SAPNetDA
- Detectron2
- Grad-CAM.pytorch
- Professor Yie Tarng Chen, lab classmates Tyson and Bo Cheng Lee