AI Research Platform for Reinforcement Learning from Real Panoramic Images.
The Matterport3D Simulator enables development of AI agents that interact with real 3D environments using visual information (RGB-D images). It is primarily intended for research in deep reinforcement learning, at the intersection of computer vision, natural language processing and robotics.
This is development code for early release. We may make breaking changes, particularly as we look at possible integration with ParlAI and OpenAI Gym.
Visit the main website for updates and to view a demo.
- Dataset consisting of 90 different predominantly indoor environments,
- All images are real, not synthetic (providing much more visual complexity),
- API for C++ and Python
- Customizable image resolution, camera parameters, etc,
- Supports GPU rendering using OpenGL, as well as off-screen CPU rendering using OSMESA,
- Future releases will include depth data (RGB-D) as well as class and instance object segmentations.
The Matterport3D Simulator and the Room-to-Room (R2R) navigation dataset are described in:
If you use the simulator or dataset, please cite our paper (CVPR 2018 spotlight oral):
@inproceedings{mattersim,
title={{Vision-and-Language Navigation}: Interpreting visually-grounded navigation instructions in real environments},
author={Peter Anderson and Qi Wu and Damien Teney and Jake Bruce and Mark Johnson and Niko S{\"u}nderhauf and Ian Reid and Stephen Gould and Anton van den Hengel},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
Matterport3D Simulator is based on densely sampled 360-degree indoor RGB-D images from the Matterport3D dataset. The dataset consists of 90 different indoor environments, including homes, offices, churches and hotels. Each environment contains full 360-degree RGB-D scans from between 8 and 349 viewpoints, spread on average 2.25m apart throughout the entire walkable floorplan of the scene.
At each viewpoint location, the agent can pan and elevate the camera. The agent can also choose to move between viewpoints. The precise details of the agent's observations and actions are described in the paper and defined in include/MatterSim.hpp
.
Currently the simulator supports one task. We hope this will grow.
Please refer to specific instructions to setup and run this task. There is a test server and leaderboard available at EvalAI.
A C++ compiler with C++11 support is required. Matterport3D Simulator has several dependencies:
- OpenCV >= 2.4 including 3.x
- OpenGL
- OSMesa
- GLM
- Numpy
- pybind11 for Python bindings
- Doxygen for building documentation
E.g. installing dependencies on Ubuntu:
sudo apt-get install libopencv-dev python-opencv freeglut3 freeglut3-dev libglm-dev libjsoncpp-dev doxygen libosmesa6-dev libosmesa6
Clone the Matterport3DSimulator repository:
# Make sure to clone with --recursive
git clone --recursive https://github.com/peteanderson80/Matterport3DSimulator.git
cd Matterport3DSimulator
If you didn't clone with the --recursive
flag, then you'll need to manually clone the pybind submodule from the top-level directory:
git submodule update --init --recursive
connectivity
: Json navigation graphs.webgl_imgs
: Contains dataset views rendered with javascript (for test comparisons).sim_imgs
: Will contain simulator rendered images after running tests.models
: Caffe models for precomputing ResNet image features.img_features
: Storage for precomputed image features.data
: You create a symlink to the Matterport3D dataset.tasks
: Currently just the Room-to-Room (R2R) navigation task.web
: Javascript code for visualizing trajectories and collecting annotations in AMT.
Other directories are mostly self-explanatory.
To use the simulator you must first download either the Matterport3D Dataset, or you can download the precomputed ResNet image features and use discretized viewpoints.
Download the Matterport3D dataset which is available after requesting access here. The provided download script allows for downloading of selected data types. Note that for the Matterport3D Simulator, only the following data types are required (and can be selected with the download script):
matterport_skybox_images
Create a symlink to the Matterport3D Dataset, which should be structured as <Matterdata>/v1/scans/<scanId>/matterport_skybox_images/*.jpg
:
ln -s <Matterdata> data
Using symlinks will allow the same Matterport3D dataset installation to be used between multiple projects.
To speed up model training times, it is convenient to discretize heading and elevation into 30 degree increments, and to precompute image features for each view.
We generate image features using Caffe. To replicate our approach, first download and save some Caffe ResNet-152 weights into the models
directory. We experiment with weights pretrained on ImageNet, and also weights finetuned on the Places365 dataset. The script scripts/precompute_features.py
can then be used to precompute ResNet-152 features. Features are saved in tsv format in the img_features
directory.
Alternatively, skip the generation and just download and extract our tsv files into the img_features
directory:
Build OpenGL version using CMake:
mkdir build && cd build
cmake ..
make
cd ../
Or build headless OSMESA version using CMake:
mkdir build && cd build
cmake -DOSMESA_RENDERING=ON ..
make
cd ../
To build html docs for C++ classes in the doxygen
directory, run this command and navigate to doxygen/html/index.html
:
doxygen
These are very simple demos designed to illustrate the use of the simulator in python and C++. Use the arrow keys to pan and tilt the camera. In the python demo, the top row number keys can be used to move to another viewpoint (if any are visible).
Python demo:
python src/driver/driver.py
C++ demo:
build/mattersim_main
build/tests
Or, if you haven't installed the Matterport3D dataset, you will need to skip the rendering tests:
build/tests exclude:[Rendering]
Refer to the Catch documentation for additional usage and configuration options.
The Matterport3D dataset, and data derived from it, is released under the Matterport3D Terms of Use. Our code is released under the MIT license.
We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. This project is supported by a Facebook ParlAI Research Award and by the Australian Centre for Robotic Vision.
We welcome contributions from the community. All submissions require review and in most cases would require tests.