autonomousvision
diff --git a/‎DATASETS.md
Lines changed: 74 additions & 0 deletions b/‎DATASETS.md
Lines changed: 74 additions & 0 deletions
diff --git a/‎MODEL_ZOO.md
Lines changed: 71 additions & 0 deletions b/‎MODEL_ZOO.md
Lines changed: 71 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 132 additions & 0 deletions b/‎README.md
Lines changed: 132 additions & 0 deletions
diff --git a/‎assets/teaser.svg
Lines changed: 1 addition & 0 deletions b/‎assets/teaser.svg
Lines changed: 1 addition & 0 deletions
diff --git a/‎conda_environment.yml
Lines changed: 108 additions & 0 deletions b/‎conda_environment.yml
Lines changed: 108 additions & 0 deletions
diff --git a/‎dataloader/__init__.py b/‎dataloader/__init__.py
diff --git a/‎dataloader/depth/__init__.py b/‎dataloader/depth/__init__.py
@@ -0,0 +1,74 @@
+# Datasets
+
+
+
+## Optical Flow
+
+The datasets used to train and evaluate our GMFlow model are as follows:
+
+- [FlyingChairs](https://lmb.informatik.uni-freiburg.de/resources/datasets/FlyingChairs.en.html#flyingchairs)
+- [FlyingThings3D](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
+- [Sintel](http://sintel.is.tue.mpg.de/)
+- [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/)
+- [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow)
+- [HD1K](http://hci-benchmark.iwr.uni-heidelberg.de/)
+
+By default the dataloader [dataloader/flow/datasets.py](dataloader/flow/datasets.py) assumes the datasets are located in the `datasets` directory.
+
+It is recommended to symlink your dataset root to `datasets`:
+
+```
+ln -s $YOUR_DATASET_ROOT datasets
+```
+
+Otherwise, you may need to change the corresponding paths in [dataloader/flow/datasets.py](dataloader/flow/datasets.py).
+
+
+
+## Stereo Matching
+
+The datasets used to train and evaluate our GMStereo model are as follows:
+
+- [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
+- [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/)
+- [KITTI](https://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
+- [TartanAir](https://github.com/castacks/tartanair_tools)
+- [Falling Things](https://research.nvidia.com/publication/2018-06_Falling-Things)
+- [HR-VS](https://drive.google.com/file/d/1SgEIrH_IQTKJOToUwR1rx4-237sThUqX/view)
+- [CREStereo Dataset](https://github.com/megvii-research/CREStereo/blob/master/dataset_download.sh)
+- [InStereo2K](https://github.com/YuhuaXu/StereoDataset)
+- [Middlebury](https://vision.middlebury.edu/stereo/data/)
+- [Sintel Stereo](http://sintel.is.tue.mpg.de/stereo)
+- [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-training-data)
+
+By default the dataloader [dataloader/stereo/datasets.py](dataloader/stereo/datasets.py) assumes the datasets are located in the `datasets` directory.
+
+It is recommended to symlink your dataset root to `datasets`:
+
+```
+ln -s $YOUR_DATASET_ROOT datasets
+```
+
+Otherwise, you may need to change the corresponding paths in [dataloader/stereo/datasets.py](dataloader/flow/datasets.py).
+
+
+
+## Depth Estimation
+
+The datasets used to train and evaluate our GMDepth model are as follows:
+
+- [DeMoN](https://github.com/lmb-freiburg/demon)
+- [ScanNet](http://www.scan-net.org/)
+
+We support downloading and extracting the DeMoN dataset in our code: [dataloader/depth/download_demon_train.sh](dataloader/depth/download_demon_train.sh),  [dataloader/depth/download_demon_test.sh](dataloader/depth/download_demon_test.sh),  [dataloader/depth/prepare_demon_train.sh](dataloader/depth/prepare_demon_train.sh) and  [dataloader/depth/prepare_demon_test.sh](dataloader/depth/prepare_demon_test.sh).
+
+By default the dataloader [dataloader/depth/datasets.py](dataloader/depth/datasets.py) assumes the datasets are located in the `datasets` directory.
+
+It is recommended to symlink your dataset root to `datasets`:
+
+```
+ln -s $YOUR_DATASET_ROOT datasets
+```
+
+Otherwise, you may need to change the corresponding paths in [dataloader/depth/datasets.py](dataloader/depth/datasets.py).
+
@@ -0,0 +1,71 @@
+# Model Zoo
+
+- The models are named as `model-dataset`. 
+- Model definition: `scale1` denotes the 1/8 feature resolution model, `scale2` denotes the 1/8 & 1/4 model, `scaleX-regrefineY` denotes the `X`-scale model with additional `Y` local regression refinements.
+- The inference time is averaged over 100 runs, measured with batch size 1 on a single NVIDIA A100 GPU.
+- All pretrained models can be downloaded together at [pretrained.zip](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained.zip), or they can be downloaded individually as listed below.
+
+
+
+## Optical Flow
+
+- The inference time is measured for Sintel resolution: 448x1024
+
+- The `*-mixdata` models are trained on several mixed public datasets, which are recommended for in-the-wild use cases.
+
+  
+
+| Model                             | Params (M) | Time (ms) |                           Download                           |
+| --------------------------------- | :--------: | :-------: | :----------------------------------------------------------: |
+| GMFlow-scale1-things              |    4.7     |    26     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale1-things-e9887eda.pth) |
+| GMFlow-scale1-mixdata             |    4.7     |    26     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale1-mixdata-train320x576-4c3a6e9a.pth) |
+| GMFlow-scale2-things              |    4.7     |    66     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-things-36579974.pth) |
+| GMFlow-scale2-sintel              |    4.7     |    66     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-sintel-3ed1cf48.pth) |
+| GMFlow-scale2-mixdata             |    4.7     |    66     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-mixdata-train320x576-9ff1c094.pth) |
+| GMFlow-scale2-regrefine6-things   |    7.4     |    122    | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-things-776ed612.pth) |
+| GMFlow-scale2-regrefine6-sintelft |    7.4     |    122    | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-sintelft-6e39e2b9.pth) |
+| GMFlow-scale2-regrefine6-kitti    |    7.4     |    122    | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-kitti15-25b554d7.pth) |
+| GMFlow-scale2-regrefine6-mixdata  |    7.4     |    122    | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth) |
+
+
+
+## Stereo Matching
+
+- The inference time is measured for KITTI resolution: 384x1248
+- The `*-resumeflowthings-*` denotes that the models are trained with GMFlow model as initialization, where GMFlow is trained on Chairs and Things dataset for optical flow task.
+- The `*-mixdata` models are trained on several mixed public datasets, which are recommended for in-the-wild use cases.
+
+| Model                                                  | Params (M) | Time (ms) |  Download  |
+| ------------------------------------------------------ | :--------: | :-------: | :--------: |
+| GMStereo-scale1-sceneflow                              |    4.7     |    23     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale1-sceneflow-124a438f.pth) |
+| GMStereo-scale1-resumeflowthings-sceneflow             |    4.7     |    23     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale1-resumeflowthings-sceneflow-16e38788.pth) |
+| GMStereo-scale2-sceneflow                              |    4.7     |    58     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-sceneflow-ab93ba6a.pth) |
+| GMStereo-scale2-resumeflowthings-sceneflow             |    4.7     |    58     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-resumeflowthings-sceneflow-48020649.pth) |
+| GMStereo-scale2-regrefine3-sceneflow                   |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-sceneflow-2dd12e97.pth) |
+| GMStereo-scale2-regrefine3-resumeflowthings-sceneflow  |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-resumeflowthings-sceneflow-f724fee6.pth) |
+| GMStereo-scale2-regrefine3-resumeflowthings-kitti      |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-resumeflowthings-kitti15-04487ebf.pth) |
+| GMStereo-scale2-regrefine3-resumeflowthings-middlebury |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-resumeflowthings-middleburyfthighres-a82bec03.pth) |
+| GMStereo-scale2-regrefine3-resumeflowthings-eth3dft    |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-resumeflowthings-eth3dft-a807cb16.pth) |
+| GMStereo-scale2-regrefine3-resumeflowthings-mixdata    |    7.4     |    86     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmstereo-scale2-regrefine3-resumeflowthings-mixdata-train320x640-ft640x960-e4e291fd.pth) |
+
+
+
+## Depth Estimation
+
+- The inference time is measured for ScanNet resolution: 480x640
+
+- The `*-resumeflowthings-*` models are trained with a pretrained GMFlow model as initialization, where GMFlow is trained on Chairs and Things dataset for optical flow task.
+
+  
+
+| Model                                              | Params (M) | Time (ms) |                           Download                           |
+| -------------------------------------------------- | :--------: | :-------: | :----------------------------------------------------------: |
+| GMDepth-scale1-scannet                             |    4.7     |    17     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-scannet-d3d1efb5.pth) |
+| GMDepth-scale1-resumeflowthings-scannet            |    4.7     |    17     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth) |
+| GMDepth-scale1-regrefine1-resumeflowthings-scannet |    4.7     |    17     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-regrefine1-resumeflowthings-scannet-90325722.pth) |
+| GMDepth-scale1-demon                               |    7.3     |    20     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-demon-bd64786e.pth) |
+| GMDepth-scale1-resumeflowthings-demon              |    7.3     |    20     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-demon-a2fe127b.pth) |
+| GMDepth-scale1-regrefine1-resumeflowthings-demon   |    7.3     |    20     | [download](https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-regrefine1-resumeflowthings-demon-7c23f230.pth) |
+
+
+
@@ -0,0 +1,132 @@
+<p align="center">
+  <h1 align="center">Unifying Flow, Stereo and Depth Estimation</h1>
+  <p align="center">
+    <a href="https://haofeixu.github.io/">Haofei Xu</a>
+    ·
+    <a href="https://scholar.google.com/citations?user=9jH5v74AAAAJ">Jing Zhang</a>
+    ·
+    <a href="https://jianfei-cai.github.io/">Jianfei Cai</a>
+    ·
+    <a href="https://scholar.google.com/citations?user=VxAuxMwAAAAJ">Hamid Rezatofghi</a>
+    ·
+    <a href="https://www.yf.io/">Fisher Yu</a>
+    ·
+    <a href="https://scholar.google.com/citations?user=RwlJNLcAAAAJ">Dacheng Tao</a>
+    ·
+    <a href="http://www.cvlibs.net/">Andreas Geiger</a>
+  </p>
+  <h3 align="center"><a href="https://arxiv.org/abs/2211.xxxxx">Paper</a> | <a href="https://haofeixu.github.io/unimatch/">Project Page</a> | <a >Colab (Coming Soon)</a> </h3>
+  <div align="center"></div>
+</p>
+
+<p align="center">
+  <a href="">
+    <img src="./assets/teaser.svg" alt="Logo" width="70%">
+  </a>
+</p>
+
+<p align="center">
+A unified model for three motion and 3D perception tasks.
+</p>
+
+
+This project is developed based on our previous works: 
+
+- [GMFlow: Learning Optical Flow via Global Matching, CVPR 2022, Oral](https://github.com/haofeixu/gmflow)
+
+- [High-Resolution Optical Flow from 1D Attention and Correlation, ICCV 2021, Oral](https://github.com/haofeixu/flow1d)
+
+- [AANet: Adaptive Aggregation Network for Efficient Stereo Matching, CVPR 2020](https://github.com/haofeixu/aanet)
+
+
+
+## Installation
+
+Our code is developed based on pytorch 1.9.0, CUDA 10.2 and python 3.8. Higher version pytorch should also work well.
+
+We recommend using [conda](https://www.anaconda.com/distribution/) for installation:
+
+```
+conda env create -f conda_environment.yml
+conda activate unimatch
+```
+
+Alternatively, we also support installing with pip:
+
+```
+bash pip_install.sh
+```
+
+
+
+## Model Zoo
+
+All pretrained models for flow, stereo and depth on different datasets are available in [MODEL_ZOO.md](MODEL_ZOO.md).
+
+We assume the downloaded weights are located under the `pretrained` directory.
+
+Otherwise, you may need to change the corresponding paths in the scripts.
+
+
+
+## Demo
+
+Given an image pair or a video sequence, our code supports generating prediction results of optical flow, disparity and depth.
+
+Please refer to [scripts/gmflow_demo.sh](scripts/gmflow_demo.sh), [scripts/gmstereo_demo.sh](scripts/gmstereo_demo.sh) and [scripts/gmdepth_demo.sh](scripts/depth_demo.sh) for example usages.
+
+
+
+## Datasets
+
+The datasets used to train and evaluate our models for all three tasks are given in [DATASETS.md](DATASETS.md)
+
+
+
+## Evaluation
+
+The evaluation scripts used to reproduce the numbers in our paper are given in [scripts/gmflow_evaluate.sh](scripts/gmflow_evaluate.sh), [scripts/gmstereo_evaluate.sh](scripts/gmstereo_evaluate.sh) and [scripts/gmdepth_evaluate.sh](scripts/gmdepth_evaluate.sh).
+
+For submission to KITTI, Sintel, Middlebury and ETH3D online test sets, you can run [scripts/gmflow_submission.sh](scripts/gmflow_submission.sh) and [scripts/gmstereo_submission.sh](scripts/gmstereo_submission.sh) to generate the prediction results. The results can be submitted directly.
+
+
+
+## Training
+
+All training scripts for different model variants on different datasets can be found in [scripts/*_train.sh](scripts).
+
+We support using tensorboard to monitor and visualize the training process. You can first start a tensorboard session with
+
+```
+tensorboard --logdir checkpoints
+```
+
+and then access [http://localhost:6006](http://localhost:6006/) in your browser.
+
+
+
+## Citation
+
+If you find our work useful in your research, please consider citing our paper:
+
+```
+@article{xu2022unifying,
+  title={Unifying Flow, Stereo and Depth Estimation},
+  author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},
+  journal={arXiv preprint arXiv:2211.xxxxx},
+  year={2022}
+}
+```
+
+
+
+## Acknowledgements
+
+This project would not have been possible without relying on some awesome repos: [RAFT](https://github.com/princeton-vl/RAFT), [LoFTR](https://github.com/zju3dv/LoFTR), [DETR](https://github.com/facebookresearch/detr), [Swin](https://github.com/microsoft/Swin-Transformer), [mmdetection](https://github.com/open-mmlab/mmdetection) and [Detectron2](https://github.com/facebookresearch/detectron2/blob/main/projects/TridentNet/tridentnet/trident_conv.py). We thank the original authors for their excellent work.
+
+
+
+
+
+
+
@@ -0,0 +1,108 @@
+name: unimatch
+channels:
+  - pytorch
+  - defaults
+dependencies:
+  - blas=1.0=mkl
+  - brotli=1.0.9=ha925a31_2
+  - ca-certificates=2022.07.19=haa95532_0
+  - certifi=2022.6.15=py38haa95532_0
+  - cloudpickle=2.0.0=pyhd3eb1b0_0
+  - cudatoolkit=10.2.89=h74a9793_1
+  - cycler=0.11.0=pyhd3eb1b0_0
+  - cytoolz=0.11.0=py38he774522_0
+  - dask-core=2022.7.0=py38haa95532_0
+  - fonttools=4.25.0=pyhd3eb1b0_0
+  - freetype=2.10.4=hd328e21_0
+  - fsspec=2022.7.1=py38haa95532_0
+  - icc_rt=2019.0.0=h0cc432a_1
+  - icu=58.2=ha925a31_3
+  - imageio=2.9.0=pyhd3eb1b0_0
+  - intel-openmp=2022.0.0=haa95532_3663
+  - jpeg=9b=hb83a4c4_2
+  - kiwisolver=1.4.2=py38hd77b12b_0
+  - libpng=1.6.37=h2a8f88b_0
+  - libtiff=4.2.0=he0120a3_1
+  - libuv=1.40.0=he774522_0
+  - libwebp=1.2.2=h2bbff1b_0
+  - locket=1.0.0=py38haa95532_0
+  - lz4-c=1.9.3=h2bbff1b_1
+  - matplotlib=3.5.1=py38haa95532_1
+  - matplotlib-base=3.5.1=py38hd77b12b_1
+  - mkl=2020.2=256
+  - mkl-service=2.3.0=py38h196d8e1_0
+  - mkl_fft=1.3.0=py38h46781fe_0
+  - mkl_random=1.1.1=py38h47e9c7a_0
+  - munkres=1.1.4=py_0
+  - networkx=2.8.4=py38haa95532_0
+  - ninja=1.10.2=haa95532_5
+  - ninja-base=1.10.2=h6d14046_5
+  - numpy=1.19.2=py38hadc3359_0
+  - numpy-base=1.19.2=py38ha3acd2a_0
+  - openssl=1.1.1q=h2bbff1b_0
+  - packaging=21.3=pyhd3eb1b0_0
+  - partd=1.2.0=pyhd3eb1b0_1
+  - pillow=9.0.1=py38hdc2b20a_0
+  - pip=21.2.2=py38haa95532_0
+  - pyparsing=3.0.4=pyhd3eb1b0_0
+  - pyqt=5.9.2=py38hd77b12b_6
+  - python=3.8.13=h6244533_0
+  - python-dateutil=2.8.2=pyhd3eb1b0_0
+  - pytorch=1.9.0=py3.8_cuda10.2_cudnn7_0
+  - pywavelets=1.3.0=py38h2bbff1b_0
+  - pyyaml=6.0=py38h2bbff1b_1
+  - qt=5.9.7=vc14h73c81de_0
+  - scikit-image=0.19.2=py38hf11a4ad_0
+  - scipy=1.6.2=py38h14eb087_0
+  - sip=4.19.13=py38hd77b12b_0
+  - six=1.16.0=pyhd3eb1b0_1
+  - sqlite=3.38.5=h2bbff1b_0
+  - tifffile=2020.10.1=py38h8c2d366_2
+  - tk=8.6.12=h2bbff1b_0
+  - toolz=0.11.2=pyhd3eb1b0_0
+  - torchvision=0.10.0=py38_cu102
+  - tornado=6.1=py38h2bbff1b_0
+  - typing_extensions=4.1.1=pyh06a4308_0
+  - vc=14.2=h21ff451_1
+  - vs2015_runtime=14.27.29016=h5e58377_2
+  - wheel=0.37.1=pyhd3eb1b0_0
+  - wincertstore=0.2=py38haa95532_2
+  - xz=5.2.5=h8cc25b3_1
+  - yaml=0.2.5=he774522_0
+  - zlib=1.2.12=h8cc25b3_2
+  - zstd=1.5.2=h19a0ad4_0
+  - pip:
+    - absl-py==1.1.0
+    - cachetools==5.2.0
+    - charset-normalizer==2.1.0
+    - cmapy==0.6.6
+    - colorama==0.4.5
+    - configargparse==1.5.3
+    - google-auth==2.9.0
+    - google-auth-oauthlib==0.4.6
+    - grpcio==1.47.0
+    - h5py==3.7.0
+    - idna==3.3
+    - imageio-ffmpeg==0.4.7
+    - importlib-metadata==4.12.0
+    - joblib==1.2.0
+    - lz4==4.0.2
+    - markdown==3.3.7
+    - oauthlib==3.2.0
+    - opencv-python==4.6.0.66
+    - path==16.5.0
+    - protobuf==3.19.4
+    - pyasn1==0.4.8
+    - pyasn1-modules==0.2.8
+    - requests==2.28.1
+    - requests-oauthlib==1.3.1
+    - rsa==4.8
+    - scikit-video==1.1.11
+    - setuptools==59.5.0
+    - tensorboard==2.9.1
+    - tensorboard-data-server==0.6.1
+    - tensorboard-plugin-wit==1.8.1
+    - tqdm==4.64.1
+    - urllib3==1.26.9
+    - werkzeug==2.1.2
+    - zipp==3.8.0