Skip to content

walzimmer/active-anno-3d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv tum_traffic website

ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection

figure : We propose a framework for efficient active learning within various 3D object detection techniques and modalities, demonstrating the effectiveness of active learning at reaching comparable detection performance on benchmark datasets at a fraction of the annotation cost. Datasets include roadside infrastructure sensors (top row) and onboard vehicle sensors (bottom row), with LiDAR-only and LiDAR+camera fusion methods, the two dominant strategies in state-of-the-art performance at the safety-critical detection task. We use the entropy active learning sampling strategy to select the most informative data.

This is the official implementation of our paper:

[IV 2024] ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection [arXiV] [website]

Overview

ActiveAnno3D is the first active learning framework for multi-modal 3D object detection. With this framework you can select data samples for labeling that are of maximum informativeness for training.

In summary:

  1. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance.
  2. We perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset.
  3. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset.
  4. BEVFusion achieved an mAP of 64.31 when using half of the training data and 77.25 mAP when using the complete nuScenes dataset.
  5. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs.

Architecture

figure The generalized active learning flow involves the selection of data from an unlabeled pool according to an acquisition function, which, in the case of uncertainty-driven AL, utilizes the trained model or, in the case of diversity-driven AL, may be independent of the training. This selected data is then annotated by an oracle and aggregated with previously labeled data. Whether or not all data or just the new data is used in the next training step is determined by the choice of training strategy. The variety of possible acquisition and training techniques and unique domain challenges posed by autonomous driving make active learning an opportune environment for innovation toward safe and accurate learning.

Installation

For installation using Docker, please refer to the INSTALL.md file.

The code is tested in the following python environment:

Our Contributions and Modifications

  1. Incorporate Continuous Training Strategies [link]
  2. Propose Temporal CRB [link]
  3. Propose Class-weighted CRB [link]
  4. Post process Predictions [link]
  5. Develop develop an interface for the proAnno labeling tool [link]

Tutorial

The dataset, e.g. TUM Traffic Intersection, has to be in the KITTI format. If it's in OpenLABEL format then use our converter

python tumtraf_converter.py --load_dir /home/user/tumtraf --save_dir /home/user/tumtraf_kitti_format --splits train val test

To run normal training:

in ./tools/cfgs add a folder <dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag normalTraining

if you want to use a pre-trained model set

--pretrained_model to <path-to-your-pretrained-model>

To run active training:

in ./tools/cfgs add a folder <active-dataset_models> then add the .yaml configuration files for your models there.

python ./tools/main/train.py --cfg_file <path-to-yaml-file> --extra_tag <continuous_training_method>

Connection to proAnno

  1. The flask_app.py file should be located anywhere outside the docker container.

  2. To access this flask server from the proAnno tool running on a different machine, we should use the IP address of the machine where the flask app is running, followed by the port number.

    for example, if the IP address of the workstation is 192.168.1.5 and the flask server is running on port 5000, then from the proAnno tool on another machine we would access the flask app using 'http://192.168.1.5:5000'

  3. Run flask_app.py

  4. ActiveAnno3D is ready for the annotator to send a command.

  5. The selected point cloud frames will be saved in ActiveAnno3D under: ./data/proannoV2/currently_annotating, and they have to be copied manually to the proAnno side for annotations.

Evaluation

figure The graph illustrates the mAP score achieved by the BEVFusion model on the nuScenes dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately.

figure The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with random and entropy queries separately. figure The graph illustrates the mAP scores achieved by the PV-RCNN model on the TUM Traffic Intersection dataset relative to the expanding size of the training set in the active learning setting with eight different query strategies.

Qualitative Results

figure Qualitative results are illustrated by two pairs of images. The left pair is from the TUM Traffic Intersection dataset, and the right pair is from nuScenes. For each pair, the left image shows the predicted labels for each class, with each class represented by a different color. The right image of each pair shows the predictions made by learning on the complete dataset. Both results are quite similar, showing the efficiency of the active learning technique.

Benchmark Results

Labeled Pool LiDAR-Only (PV-RCNN) LiDAR+Camera (BEVFusion)
Round % Random Entropy Random Entropy
1 10 51.03 54.32

(+3.29)

30.95 31.06

(+0.11)

2 15 61.98 62.24

(+0.26)

34.19 36.39

(+2.20)

3 20 69.84 68.23

(-1.61)

38.00 40.41

(+2.41)

4 25 74.82 72.40

(-2.42)

42.36 42.17

(-0.19)

5 30 77.25 76.56

(-0.69)

44.94 45.57

(+0.63)

6 35 75.40 75.00

(-0.40)

44.74 46.76

(+2.02)

7 40 77.03 75.48

(-1.55)

46.93 49.24
8 50 79.09 77.25

(-1.84)

49.90 64.31
SOA (No AL) 100 83.50 52.88
Evaluation of the PV-RCNN (LiDAR-only) and BEVFusion (camera+LiDAR) model using the random sampling baseline and entropy querying method on the TUM Traffic Intersection dataset and the nuScenes dataset. These Results are compaired to the respective 100% accuracies of the original work.

Acknowledgements

Our baseline is the Active3D framework Active3D, an active learning framework for 3D object detection, proposing the CRB query strategy that assesses the point cloud data informativeness based on three data characteristics: 3D box class distribution, feature representativeness, and point density distribution. This research was supported by the Federal Ministry of Education and Research in Germany within the AUTOtech.agil project (01IS22088U). The publication was partly written at Virtual Vehicle Research GmbH in Graz, Austria. The authors acknowledge the financial support within the COMET K2 Competence Centers for Excellent Technologies from the Austrian Federal Ministry for Climate Action (BMK), the Austrian Federal Ministry for Labour and Economy (BMAW), the Province of Styria (Dept. 12) and the Styrian Business Promotion Agency (SFG). The Austrian Research Promotion Agency (FFG) has been authorized for the program management. The authors thank Qualcomm for the support of the Qualcomm Innovation Fellowship.

Citation

If you find our work useful in your research, please cite our work and ⭐ our repository.

@misc{activeanno3d,
      title={ActiveAnno3D - An Active Learning Framework for Multi-Modal 3D Object Detection}, 
      author={Ghita, Ahmed and Antoniussen, Bjørk and Zimmer,Walter and Greer,Ross and Creß, Christian and Møgelmose,Andreas and Trivedi, Mohan M. and Knoll,Alois C.},
      year={2024},
      eprint={2402.03235},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

The ActiveAnno3D framework is licensed under CC BY-NC-SA 4.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published