You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-4Lines changed: 23 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,8 @@ AI Research Platform for Reinforcement Learning from Real Panoramic Images.
3
3
4
4
The Matterport3D Simulator enables development of AI **agents that interact with real 3D environments using visual information** (RGB-D images). It is primarily intended for research in deep reinforcement learning, at the intersection of computer vision, natural language processing and robotics.
5
5
6
+

7
+
6
8
*This is development code for early release. We may make breaking changes, particularly as we look at possible integration with [ParlAI](https://github.com/facebookresearch/ParlAI) and [OpenAI Gym](https://github.com/openai/gym).*
7
9
8
10
## Features
@@ -13,13 +15,21 @@ The Matterport3D Simulator enables development of AI **agents that interact with
13
15
- Supports GPU rendering using OpenGL, as well as off-screen CPU rendering using OSMESA,
14
16
- Future releases will include depth data (RGB-D) as well as class and instance object segmentations.
15
17
16
-
## Cite as
18
+
## Reference
19
+
20
+
The Matterport3D Simulator and the Room-to-Room (R2R) navigation dataset are described in:
21
+
-[Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments](https://arxiv.org/abs/1711.07280).
17
22
18
-
Todo
23
+
If you use the simulator or dataset, please cite our paper:
19
24
20
25
### Bibtex:
21
26
```
22
-
todo
27
+
@article{mattersim,
28
+
title={{Vision-and-Language Navigation}: Interpreting visually-grounded navigation instructions in real environments},
29
+
author={Peter Anderson and Qi Wu and Damien Teney and Jake Bruce and Mark Johnson and Niko Sünderhauf and Ian Reid and Stephen Gould and Anton van den Hengel},
30
+
journal={arXiv preprint arXiv:1711.07280},
31
+
year={2017}
32
+
}
23
33
```
24
34
25
35
## Simulator Data
@@ -28,7 +38,7 @@ Matterport3D Simulator is based on densely sampled 360-degree indoor RGB-D image
28
38
29
39
### Actions
30
40
31
-
At each viewpoint location, the agent can pan and elevate the camera. The agent can also choose to move between viewpoints. The precise details of the agent's observations and actions are configurable.
41
+
At each viewpoint location, the agent can pan and elevate the camera. The agent can also choose to move between viewpoints. The precise details of the agent's observations and actions are described in the paper and defined in `include/MatterSim.hpp`.
32
42
33
43
## Tasks
34
44
@@ -129,6 +139,9 @@ doxygen
129
139
```
130
140
131
141
### Demo
142
+
143
+
These are very simple demos designed to illustrate the use of the simulator in python and C++. Use the arrow keys to pan and tilt the camera. In the python demo, the top row number keys can be used to move to another viewpoint (if any are visible).
144
+
132
145
Python demo:
133
146
```
134
147
python src/driver/driver.py
@@ -157,4 +170,10 @@ The Matterport3D dataset, and data derived from it, is released under the [Matte
157
170
158
171
We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. This project is supported by a Facebook ParlAI Research Award and by the [Australian Centre for Robotic Vision](https://www.roboticvision.org/).
159
172
173
+
## Contributing
174
+
175
+
We welcome contributions from the community. All submissions require review and in most cases would require tests.
Data consists of train/val-seen/val-unseen/test splits. There are two validation sets to better understand generalization performance between buildings that are in the training set (val-seen) and unseen buildings. The test set consists entirely of unseen buildings.
@@ -10,3 +9,64 @@ To download, from the top level directory, run:
10
9
```
11
10
./tasks/R2R/data/download.sh
12
11
```
12
+
13
+
Data is formatted as follows:
14
+
```
15
+
{
16
+
"distance": float,
17
+
"scan": str,
18
+
"path_id": int,
19
+
"path": [str x num_steps],
20
+
"heading": float,
21
+
"instructions": [str x 3],
22
+
}
23
+
```
24
+
-`distance`: length of the path in meters.
25
+
-`scan`: Matterport scan id.
26
+
-`path_id`: Unique id for this path.
27
+
-`path`: List of viewpoint ids (the first is is the start location, the last is the goal location)
28
+
-`heading`: Agents initial heading in radians (elevation is always assumed to be zero).
29
+
-`instructions`: Three unique natural language strings describing how to find the goal given the start pose.
30
+
31
+
For the test set, only the first path_id (starting location) is included. We will provide a test server for scoring uploaded trajectories according to the metrics in the [paper](https://arxiv.org/abs/1711.07280).
32
+
33
+
## Directory Structure
34
+
35
+
-`env.py`: Wraps the simulator and adds language instructions, with several simplifications -- namely discretized heading / elevation and pre-cached image features. This is not intended to be a standard component, or to preclude the use of continous camera actions, end-to-end training etc. Use the simulator and the data as you see fit, but this can provide a starting point.
36
+
-`utils.py`: Text pre-processing, navigation graph loading etc.
37
+
-`eval.py`: Evaluation script.
38
+
-`model.py`: PyTorch seq2seq model with attention.
39
+
-`agent.py`: Various implementations of an agent.
40
+
-`train.py`: Training entrypoint, parameter settings etc.
41
+
-`plot.py`: Figures from the arXiv paper.
42
+
43
+
## Prerequisites
44
+
45
+
Python 2, [PyTorch](http://pytorch.org/), [NetworkX](https://networkx.github.io/). Install python dependencies by running:
46
+
```
47
+
pip install -r /tasks/R2R/requirements.txt
48
+
```
49
+
50
+
## Training and Evaluation
51
+
52
+
To train the seq2seq model with student-forcing:
53
+
```
54
+
python tasks/R2R/train.py
55
+
```
56
+
57
+
To run some simple baselines:
58
+
```
59
+
python tasks/R2R/eval.py
60
+
```
61
+
62
+
Generate figures from the paper:
63
+
```
64
+
python tasks/R2R/plot.py
65
+
```
66
+
67
+
The simple baselines include:
68
+
-`ShortestAgent`: Agent that always follows the shortest path to goal (foundation for supervised training).
69
+
-`RandomAgent`: Agent that randomly picks a directly, then tries to go straight for 5 viewpoints.
70
+
-`StopAgent`: Agent that remains at the starting position.
0 commit comments