Skip to content

Commit 9dd7586

Browse files
Add R2R dataset
1 parent 8a4860c commit 9dd7586

File tree

17 files changed

+1309
-55
lines changed

17 files changed

+1309
-55
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2017 Peter Anderson, Philip Roberts, Qi Wu, Damien Teney,
3+
Copyright (c) 2017 Peter Anderson, Philip Roberts, Qi Wu, Damien Teney, Jake Bruce
44
Mark Johnson, Niko Sunderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
55

66
Permission is hereby granted, free of charge, to any person obtaining a copy

README.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ AI Research Platform for Reinforcement Learning from Real Panoramic Images.
33

44
The Matterport3D Simulator enables development of AI **agents that interact with real 3D environments using visual information** (RGB-D images). It is primarily intended for research in deep reinforcement learning, at the intersection of computer vision, natural language processing and robotics.
55

6+
![Concept](teaser.jpg)
7+
68
*This is development code for early release. We may make breaking changes, particularly as we look at possible integration with [ParlAI](https://github.com/facebookresearch/ParlAI) and [OpenAI Gym](https://github.com/openai/gym).*
79

810
## Features
@@ -13,13 +15,21 @@ The Matterport3D Simulator enables development of AI **agents that interact with
1315
- Supports GPU rendering using OpenGL, as well as off-screen CPU rendering using OSMESA,
1416
- Future releases will include depth data (RGB-D) as well as class and instance object segmentations.
1517

16-
## Cite as
18+
## Reference
19+
20+
The Matterport3D Simulator and the Room-to-Room (R2R) navigation dataset are described in:
21+
- [Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments](https://arxiv.org/abs/1711.07280).
1722

18-
Todo
23+
If you use the simulator or dataset, please cite our paper:
1924

2025
### Bibtex:
2126
```
22-
todo
27+
@article{mattersim,
28+
title={{Vision-and-Language Navigation}: Interpreting visually-grounded navigation instructions in real environments},
29+
author={Peter Anderson and Qi Wu and Damien Teney and Jake Bruce and Mark Johnson and Niko Sünderhauf and Ian Reid and Stephen Gould and Anton van den Hengel},
30+
journal={arXiv preprint arXiv:1711.07280},
31+
year={2017}
32+
}
2333
```
2434

2535
## Simulator Data
@@ -28,7 +38,7 @@ Matterport3D Simulator is based on densely sampled 360-degree indoor RGB-D image
2838

2939
### Actions
3040

31-
At each viewpoint location, the agent can pan and elevate the camera. The agent can also choose to move between viewpoints. The precise details of the agent's observations and actions are configurable.
41+
At each viewpoint location, the agent can pan and elevate the camera. The agent can also choose to move between viewpoints. The precise details of the agent's observations and actions are described in the paper and defined in `include/MatterSim.hpp`.
3242

3343
## Tasks
3444

@@ -129,6 +139,9 @@ doxygen
129139
```
130140

131141
### Demo
142+
143+
These are very simple demos designed to illustrate the use of the simulator in python and C++. Use the arrow keys to pan and tilt the camera. In the python demo, the top row number keys can be used to move to another viewpoint (if any are visible).
144+
132145
Python demo:
133146
```
134147
python src/driver/driver.py
@@ -157,4 +170,10 @@ The Matterport3D dataset, and data derived from it, is released under the [Matte
157170

158171
We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. This project is supported by a Facebook ParlAI Research Award and by the [Australian Centre for Robotic Vision](https://www.roboticvision.org/).
159172

173+
## Contributing
174+
175+
We welcome contributions from the community. All submissions require review and in most cases would require tests.
176+
177+
178+
160179

src/lib/MatterSim.cpp

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -121,10 +121,10 @@ void Simulator::init() {
121121
ctx = OSMesaCreateContext(OSMESA_RGBA, NULL);
122122
buffer = malloc(width * height * 4 * sizeof(GLubyte));
123123
if (!buffer) {
124-
throw std::runtime_error( "Malloc image buffer failed" );
124+
throw std::runtime_error( "MatterSim: Malloc image buffer failed" );
125125
}
126126
if (!OSMesaMakeCurrent(ctx, buffer, GL_UNSIGNED_BYTE, width, height)) {
127-
throw std::runtime_error( "OSMesaMakeCurrent failed" );
127+
throw std::runtime_error( "MatterSim: OSMesaMakeCurrent failed" );
128128
}
129129
#else
130130
cv::namedWindow("renderwin", cv::WINDOW_OPENGL);
@@ -159,7 +159,7 @@ void Simulator::init() {
159159

160160
// Always check that our framebuffer is ok
161161
if(glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) {
162-
throw std::runtime_error( "GL_FRAMEBUFFER failure" );
162+
throw std::runtime_error( "MatterSim: GL_FRAMEBUFFER failure" );
163163
}
164164
#endif
165165

@@ -245,7 +245,7 @@ void Simulator::loadLocationGraph() {
245245
auto navGraphFile = navGraphPath + "/" + state->scanId + "_connectivity.json";
246246
std::ifstream ifs(navGraphFile, std::ifstream::in);
247247
if (ifs.fail()){
248-
throw std::invalid_argument( "Could not open navigation graph file: " +
248+
throw std::invalid_argument( "MatterSim: Could not open navigation graph file: " +
249249
navGraphFile + ", is scan id valid?" );
250250
}
251251
ifs >> root;
@@ -325,14 +325,14 @@ void Simulator::loadTexture(int locationId) {
325325
auto zpos = cv::imread(datafolder + viewpointId + "_skybox1_sami.jpg");
326326
auto zneg = cv::imread(datafolder + viewpointId + "_skybox3_sami.jpg");
327327
if (xpos.empty() || xneg.empty() || ypos.empty() || yneg.empty() || zpos.empty() || zneg.empty()) {
328-
throw std::invalid_argument( "Could not open skybox files at: " + datafolder + viewpointId + "_skybox*_sami.jpg");
328+
throw std::invalid_argument( "MatterSim: Could not open skybox files at: " + datafolder + viewpointId + "_skybox*_sami.jpg");
329329
}
330330
cpuLoadTimer.Stop();
331331
gpuLoadTimer.Start();
332332
setupCubeMap(scanLocations[state->scanId][locationId]->cubemap_texture, xpos, xneg, ypos, yneg, zpos, zneg);
333333
gpuLoadTimer.Stop();
334334
if (!glIsTexture(scanLocations[state->scanId][locationId]->cubemap_texture)){
335-
throw std::runtime_error( "loadTexture failed" );
335+
throw std::runtime_error( "MatterSim: loadTexture failed" );
336336
}
337337
}
338338

@@ -402,23 +402,23 @@ void Simulator::newEpisode(const std::string& scanId,
402402
ix++;
403403
if (ix >= scanLocations[state->scanId].size()) ix = 0;
404404
if (ix == start_ix) {
405-
throw std::logic_error( "ScanId: " + scanId + " has no included viewpoints!");
405+
throw std::logic_error( "MatterSim: ScanId: " + scanId + " has no included viewpoints!");
406406
}
407407
}
408408
} else {
409409
// Find index of selected viewpoint
410410
for (int i = 0; i < scanLocations[state->scanId].size(); ++i) {
411411
if (scanLocations[state->scanId][i]->viewpointId == viewpointId) {
412412
if (!scanLocations[state->scanId][i]->included) {
413-
throw std::invalid_argument( "ViewpointId: " +
413+
throw std::invalid_argument( "MatterSim: ViewpointId: " +
414414
viewpointId + ", is excluded from the connectivity graph." );
415415
}
416416
ix = i;
417417
break;
418418
}
419419
}
420420
if (ix < 0) {
421-
throw std::invalid_argument( "Could not find viewpointId: " +
421+
throw std::invalid_argument( "MatterSim: Could not find viewpointId: " +
422422
viewpointId + ", is viewpoint id valid?" );
423423
}
424424
}
@@ -472,7 +472,7 @@ void Simulator::makeAction(int index, double heading, double elevation) {
472472
// move
473473
if (!initialized || index < 0 || index >= state->navigableLocations.size() ){
474474
std::stringstream msg;
475-
msg << "Invalid action index: " << index;
475+
msg << "MatterSim: Invalid action index: " << index;
476476
throw std::domain_error( msg.str() );
477477
}
478478
state->location = state->navigableLocations[index];

tasks/R2R/README.md

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Room-to-Room (R2R) Navigation Task
22

33

4-
54
## Download Data
65

76
Data consists of train/val-seen/val-unseen/test splits. There are two validation sets to better understand generalization performance between buildings that are in the training set (val-seen) and unseen buildings. The test set consists entirely of unseen buildings.
@@ -10,3 +9,64 @@ To download, from the top level directory, run:
109
```
1110
./tasks/R2R/data/download.sh
1211
```
12+
13+
Data is formatted as follows:
14+
```
15+
{
16+
"distance": float,
17+
"scan": str,
18+
"path_id": int,
19+
"path": [str x num_steps],
20+
"heading": float,
21+
"instructions": [str x 3],
22+
}
23+
```
24+
- `distance`: length of the path in meters.
25+
- `scan`: Matterport scan id.
26+
- `path_id`: Unique id for this path.
27+
- `path`: List of viewpoint ids (the first is is the start location, the last is the goal location)
28+
- `heading`: Agents initial heading in radians (elevation is always assumed to be zero).
29+
- `instructions`: Three unique natural language strings describing how to find the goal given the start pose.
30+
31+
For the test set, only the first path_id (starting location) is included. We will provide a test server for scoring uploaded trajectories according to the metrics in the [paper](https://arxiv.org/abs/1711.07280).
32+
33+
## Directory Structure
34+
35+
- `env.py`: Wraps the simulator and adds language instructions, with several simplifications -- namely discretized heading / elevation and pre-cached image features. This is not intended to be a standard component, or to preclude the use of continous camera actions, end-to-end training etc. Use the simulator and the data as you see fit, but this can provide a starting point.
36+
- `utils.py`: Text pre-processing, navigation graph loading etc.
37+
- `eval.py`: Evaluation script.
38+
- `model.py`: PyTorch seq2seq model with attention.
39+
- `agent.py`: Various implementations of an agent.
40+
- `train.py`: Training entrypoint, parameter settings etc.
41+
- `plot.py`: Figures from the arXiv paper.
42+
43+
## Prerequisites
44+
45+
Python 2, [PyTorch](http://pytorch.org/), [NetworkX](https://networkx.github.io/). Install python dependencies by running:
46+
```
47+
pip install -r /tasks/R2R/requirements.txt
48+
```
49+
50+
## Training and Evaluation
51+
52+
To train the seq2seq model with student-forcing:
53+
```
54+
python tasks/R2R/train.py
55+
```
56+
57+
To run some simple baselines:
58+
```
59+
python tasks/R2R/eval.py
60+
```
61+
62+
Generate figures from the paper:
63+
```
64+
python tasks/R2R/plot.py
65+
```
66+
67+
The simple baselines include:
68+
- `ShortestAgent`: Agent that always follows the shortest path to goal (foundation for supervised training).
69+
- `RandomAgent`: Agent that randomly picks a directly, then tries to go straight for 5 viewpoints.
70+
- `StopAgent`: Agent that remains at the starting position.
71+
72+
![Navigation Error](plots/error.png)

0 commit comments

Comments
 (0)