This repository contains the code release of our ICCV 2021 paper:
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo
Wang Zhao*, Shaohui Liu*, Yi Wei, Hengkai Guo, Yong-Jin Liu
We recommend to use conda to setup a specified environment. Run
conda env create -f environment.yml
First download the pretrained model from here and put it under ./pretrain/ folder.
Prepare the sequence data with color images, camera poses (4x4 cam2world transformation) and intrinsics. The sequence data structure should be like:
sequence_name
| color
| 00000.jpg
| pose
| 00000.txt
| K.txt
Run the following command to get the outputs:
python infer_folder.py --seq_dir /path/to/the/sequence/data --output_dir /path/to/save/outputs --config ./configs/test_folder.yaml
Tune the "reference gap" parameter to make sure there are sufficient overlaps and camera translations within an image pair. For ScanNet-like sequence, we recommend to use reference_gap of 20.
Download the ScanNet test split data from the official site and pre-process the data using:
python ./data/preprocess.py --data_dir /path/to/scannet/test/split/ --output_dir /path/to/save/pre-processed/scannet/test/data
This includes 1. resize the color images to 480x640 resolution 2. sample the data with interval of 20
python eval_scannet.py --data_dir /6t/jty/scannet/scans_test_sample --config ./configs/test_scannet.yaml
python eval_scannetnlspn.py --data_dir /6t/jty/scannet/scans_test_sample --config ./configs/test_scannet_nlspn.yaml
python eval_scannetprop.py --data_dir /6t/jty/scannet/scans_test_sample --config ./configs/test_scannetprop.yaml
We use the pre-processed ScanNet data from NAS, you could download the data using this link. The data structure is like:
scannet
| scannet_nas
| train
| scene0000_00
| color
| 0000.jpg
| pose
| 0000.txt
| depth
| 0000.npy
| intrinsic
| normal
| 0000_normal.npy
| val
| scans_test_sample (preprocessed ScanNet test split)
Modify the "dataset_path" variable with yours in the config yaml.
The network is trained with a two-stage strategy. The whole training process takes ~6 days with 4 Nvidia V100 GPUs.
python train.py ./configs/scannet_stage1.yaml
python train.py ./configs/scannet_stage2.yaml
If you find our work useful in your research, please consider citing:
@InProceedings{Zhao_2021_ICCV,
author = {Zhao, Wang and Liu, Shaohui and Wei, Yi and Guo, Hengkai and Liu, Yong-Jin},
title = {A Confidence-Based Iterative Solver of Depths and Surface Normals for Deep Multi-View Stereo},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {6168-6177}
}
This project heavily relies codes from NAS and we thank the authors for releasing their code.
We also thank Xiaoxiao Long for kindly helping with ScanNet evaluations.