Skip to content

yosoufe/VAMR

Repository files navigation

Vision Algorithms for Mobile Robotics

🚧 Under Construction 🚧

C++ implementation of the exercises of the "Vision Algorithm for Mobile Robotics" at ETH Zurich (The Course Website).

If there are any questions, feel free to open an issue in this github repository.

requirements

  • Developed on Ubuntu 20.04
  • docker
  • Nvidia docker container runtime and
  • Nvidia GPU and latest Nvidia driver compatible with cuda 11.4 (tested with driver 470)
    • I only know how to run a GUI application within docker with NVIDIA GPUs.
    • I have also started to implement the algorithms with CUDA as well.
      • DISCLAIMER: I am only learning GPU Programming and CUDA.
  • Python 3 and pip
  • NO MATLAB:
    • The official course is doing all the exercises in MATLAB and since I do not have MATLAB License, I am doing it in C++. The second reason is, all the jobs in robotics require strong C++ knowledge, so I am practicing C++.
  • The following libraries are used and already installed in the Docker Image:
    • Eigen library for linear algebra.
    • OpenCV and VTK only for reading, writing of image/video files and 2D/3D visualization.

setup and build docker images

cli.py is a helper command line tool to build and run docker image.

# to install the cli.py's dependencies
python -m pip install -r requirements.txt
  1. Build the cudagl:11.7.0-devel-ubuntu20.04 docker image

    Note: I am not familiar with buildx that is being used in cuda repo. This is the only way that I could build the image without pushing the image to a docker registry.

    # clone this repo wherever you like, better not to be inside this repo.
    git clone [email protected]:nvidia/container-images/cuda.git
    cd cuda
    
    sed -i "s/\ --pull\ /\ /g" build.sh
    sed -i "s/run_cmd docker buildx create --use/\echo \"\"#/g" build.sh
    ./build.sh --image-name cudagl --cuda-version 11.7.0 --os ubuntu --os-version 20.04 --arch x86_64 --cudagl
    # to remove intermediate images
    docker rmi $(docker images --filter=reference="cudagl/build-intermediate:*" -q)
    docker rmi $(docker images --filter=reference="cudagl:*base*" -q)
    docker rmi $(docker images --filter=reference="cudagl:*runtime*" -q)
    
    
    # YOUR_DOCKER_HUB_USER=yosoufe
    # sed -i "s/\/build-intermediate/_build-intermediate/g" build.sh
    # ./build.sh --image-name ${YOUR_DOCKER_HUB_USER}/cgl --cuda-version 11.7.0 --os ubuntu --os-version 20.04 --arch x86_64 --cudagl --push

    Now you should have cudagl:11.7.0-devel-ubuntu20.04 in your docker image ls.

  2. Download CUDNN ".deb" file from nvidia website into docker_extra folder. This is tested with CUDNN 8.4.1.50

  3. Download NVIDIA Nsight from https://developer.nvidia.com/nsight-systems into docker_extra folder. I have tested with Nsight Systems 2022.2.1 (Linux Host .deb Installer).

  4. Now run python cli.py build in the root of this project.

Usage

We are using docker, We are installing all the requirements inside the docker and run everything inside the docker.

The container has terminator inside which you can open multiple terminals as tabs or split windows. Read the documentation of the terminator to learn short keys.

# to run the docker image and have a terminal inside the docker image, we compile everything in the container
# It will pull my image if you do not build it yourself.
python cli.py run

Compile

cv VAMR
mkdir -p output/ex{01..09}
python cli.py run # this should start the container
# now inside the container
cd exercises
mkdir build
cd build
cmake ..
make -j`nproc`

Exercises

Directory Structure

Exercise statements can be found at exercises/statements/<Exercise directory>/statement.pdf. For example, for exercise 1 the file is at exercises/statements/Exercise 2 - PnP/statement.pdf.

The input data is not provided in this repo. You can download them from the course webpage at here under section "Course Program, Slides, and Additional Reading Material". They should placed in data/exXX/. For example, the images directory for exercies 1 should be placed at data/ex01/images. You can also check the exercise main file where the main file expect to see the input files.

The main function of each exercise file is implemented in exercises/exerciseXX.cpp for example the main file for exercise 1 is exercises/exercise01.cpp. Usually the algorithms are implemented as libraries and used with the main file. You can check the included header files in each exerciseXX.cpp to find out the name of the library. The library is implemented in a directory with the same name of the header file.

The CUDA implementations are using cuda as their namespace and they are implemented in *.cu and *.cuh files.

Exercise 1 - Augmented Reality Wireframe Cube

This is about camera and distortion models.

Output

Exercise 2 - PnP Problem

This exercise is about the PnP (Perspective-n-Point) problem. We basically find the position and orientation of a calibrated camera based on known points in world and their known correspondences in the image frame.

  • Problem statement: exercises/statements/Exercise 2 - PnP/statement.pdf.
  • Solution: exercises/exercise02.cpp.
  • Output Video:

The following video shows the calculated pose and orientation of the camera relative to the pattern of April Tags.

Output

Exercise 3 - Simple Keypoint Tracker

Tracking:

Output

The following image shows the Harris and Shi-Tomasi scores, key points and descriptors for the first frame of the dataset.

Output

Exercise 4 - Simple SIFT Keypoint Detection and Matching

  • Problem statement: exercises/statements/Exercise 4 - simple SIFT/statement.pdf.

  • Solution: exercises/exercise04.cpp.

    • ⚠️ I guess there are still some bugs in my code ⚠️, but because of lack of time and relatively good results, I would go to the next exercise for now. I also skipped the optional part of the exercise. I might come back to it later. The descriptor matching could be optimized later.

    Output

Exercise 5 - Stereo Dense Reconstruction

  • Problem statement: exercises/statements/Exercise 5 - Stereo Dense Reconstruction.

  • Solution: exercises/exercise05.cpp.

    • left first image Disparity image for first frame

    • Disparity image from left and right images Disparity image for first frame

    • Rough Point Cloud from Disparity

      ex05-pointcloud_from_disparity-rough-lowQ.mp4
    • Point Cloud from Disparity with sub-pixel accuracy

      ex05-pointcloud_from_disparity-subpixel.mp4
    • complete point cloud from all of the pair of frames (better quality video in exercises/statements/outputs/ex05-complete_point_cloud.mp4)

      ex05-complete_point_cloud_LQ.mp4

Exercise 6 - Two-view Geometry

  • Problem statement: exercises/statements/Exercise 6 - Two-view Geometry.
  • Solution: exercises/exercise06.cpp.
  • I developed unit tests using Google test framework, similar to the matlab test scripts provided by the exercise in exercises/tests/test_two_view_geometry.cpp. To execute them after the compilation in the build directory:
    ./tests/two_view_geometry_tests --gtest_filter=Two_View_Geometry.linear_triangulation
    ./tests/two_view_geometry_tests --gtest_filter=Two_View_Geometry.eight_point
    # or the following to run all of the tests for exercise 06.
    ./tests/two_view_geometry_tests
    • 3D Point cloud and camera poses calculated by 8-Point algorithm from given perfect feature matches (top view) Point Cloud

Useful Commands

# convert to gif
ffmpeg -ss 0 -t 5 -i input.mp4 -vf "fps=10,scale=320:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop 0 output.gif

# reduce the size and quality
ffmpeg -i input.mp4 -vcodec libx264 -crf 28 output.mp4

cv::Viz3d Key commands

cv::Viz3d used for 3d visualizations and point cloud visualization. These shortkeys are useful to navigate the view.

| Help:
-------
          p, P   : switch to a point-based representation
          w, W   : switch to a wireframe-based representation (where available)
          s, S   : switch to a surface-based representation (where available)

          j, J   : take a .PNG snapshot of the current window view
          k, K   : export scene to Wavefront .obj format
    ALT + k, K   : export scene to VRML format
          c, C   : display current camera/window parameters
          F5     : enable/disable fly mode (changes control style)

          e, E   : exit the interactor
          q, Q   : stop and call VTK's TerminateApp

           +/-   : increment/decrement overall point size
     +/- [+ ALT] : zoom in/out

    r, R [+ ALT] : reset camera [to viewpoint = {0, 0, 0} -> center_{x, y, z}]

    ALT + s, S   : turn stereo mode on/off
    ALT + f, F   : switch between maximized window mode and original size

About

Vision Algorithms for Mobile Robotics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published