-
Notifications
You must be signed in to change notification settings - Fork 273
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- add state and action space descriptions - add benchmark details
- Loading branch information
1 parent
16480c0
commit 9c8a992
Showing
14 changed files
with
146 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
layout: "contents" | ||
title: Action Space | ||
firstpage: | ||
--- | ||
|
||
# Action Space | ||
|
||
The action space of the Sawyer robot is a ```Box(-1.0, 1.0, (4,), float32)```. | ||
An action represents the Cartesian displacement dx, dy, and dz of the end effector, and an additional action for gripper control. | ||
|
||
| Num | Action | Control Min | Control Max | Name (in XML file) | Joint | Unit | | ||
|-----|--------|-------------|-------------|---------------------|-------|------| | ||
| 0 | Displacement of the end effector in x direction (dx) | -1 | 1 | mocap | N/A | position (m) | | ||
| 1 | Displacement of the end effector in y direction (dy) | -1 | 1 | mocap | N/A | position (m) | | ||
| 2 | Displacement of the end effector in z direction (dz) | -1 | 1 | mocap | N/A | position (m) | | ||
| 3 | Gripper adjustment (closing/opening) | -1 | 1 | rightclaw, leftclaw | r_close, l_close | position (normalized) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
--- | ||
layout: "contents" | ||
title: Benchmark Descriptions | ||
firstpage: | ||
--- | ||
|
||
# Benchmark Descriptions | ||
|
||
The benchmark provides a selection of tasks used to study generalization in reinforcement learning (RL). | ||
Different combinations of tasks provide benchmark scenarios suitable for multi-task RL and meta-RL. | ||
Unlike usual RL benchmarks, the training of the agent is strictly split into a training and testing phase. | ||
|
||
## Multi-Task Problems | ||
|
||
The multi-task setting challenges the agent to learn a predefined set of skills simultaneously. | ||
Below, different levels of difficulty are described. | ||
|
||
### Multi-Task (MT1) | ||
|
||
In the easiest setting, **MT1**, a single task needs to be learned where the agent must *reach*, *push*, or *pick and place* a goal object. | ||
There is no testing of generalization involved in this setting. | ||
|
||
```{figure} _static/mt1.gif | ||
:alt: Multi-Task 1 | ||
:width: 500 | ||
``` | ||
|
||
### Multi-Task (MT10) | ||
|
||
The **MT10** setting involves learning to solve a diverse set of 10 tasks, as depicted below. | ||
There is no testing of generalization involved in this setting. | ||
|
||
|
||
|
||
```{figure} _static/mt10.gif | ||
:alt: Multi-Task 10 | ||
:width: 500 | ||
``` | ||
|
||
### Multi-Task (MT50) | ||
|
||
In the **MT50** setting, the agent is challenged to solve the full suite of 50 tasks contained in metaworld. | ||
This is the most challenging multi-task setting and involves no evaluation on test tasks. | ||
|
||
|
||
## Meta-Learning Problems | ||
|
||
Meta-RL attempts to evaluate the [transfer learning](https://en. | ||
wikipedia.org/wiki/Transfer_learning) capabilities of agents learning skills based on a predefined set of training tasks, by evaluating generalization using a hold-out set of test tasks. | ||
In other words, this setting allows for benchmarking an algorithm's ability to adapt to or learn new tasks. | ||
|
||
### Meta-RL (ML1) | ||
|
||
The simplest meta-RL setting, **ML1**, involves a single manipulation task, such as *pick and place* of an object with a changing goal location. | ||
For the test evaluation, unseen goal locations are used to measure generalization capabilities. | ||
|
||
|
||
|
||
```{figure} _static/ml1.gif | ||
:alt: Meta-RL 1 | ||
:width: 500 | ||
``` | ||
|
||
|
||
### Meta-RL (ML10) | ||
|
||
The meta-learning setting with 10 tasks, **ML10**, involves training on 10 manipulation tasks and evaluating on 5 unseen tasks during the test phase. | ||
|
||
```{figure} _static/ml10.gif | ||
:alt: Meta-RL 10 | ||
:width: 500 | ||
``` | ||
|
||
### Meta-RL (ML45) | ||
|
||
The most difficult environment setting of metaworld, **ML45**, challenges the agent to be trained on 45 distinct manipulation tasks and evaluated on 5 test tasks. | ||
|
||
|
||
```{figure} _static/ml45.gif | ||
:alt: Meta-RL 10 | ||
:width: 500 | ||
``` |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
layout: "contents" | ||
title: State Space | ||
firstpage: | ||
--- | ||
|
||
# State Space | ||
|
||
The observation array consists of the gripper's (end effector's) position and state, alongside the object of interest's position and orientation. This table will detail each component usually present in such environments: | ||
|
||
| Num | Observation Description | Min | Max | Site Name (XML) | Joint Name (XML) | Joint Type | Unit | | ||
|-----|-----------------------------------------------|---------|---------|------------------------|-------------------|------------|-------------| | ||
| 0 | End effector x position in global coordinates | -Inf | Inf | hand | - | - | position (m)| | ||
| 1 | End effector y position in global coordinates | -Inf | Inf | hand | - | - | position (m)| | ||
| 2 | End effector z position in global coordinates | -Inf | Inf | hand | - | - | position (m)| | ||
| 3 | Gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless| | ||
| 4 | Object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 5 | Object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 6 | Object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 7 | Object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 8 | Object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 9 | Object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 10 | Object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 11 | Previous end effector x position | -Inf | Inf | hand | - | - | position (m)| | ||
| 12 | Previous end effector y position | -Inf | Inf | hand | - | - | position (m)| | ||
| 13 | Previous end effector z position | -Inf | Inf | hand | - | - | position (m)| | ||
| 14 | Previous gripper distance apart | 0.0 | 1.0 | - | - | - | dimensionless| | ||
| 15 | Previous object x position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 16 | Previous object y position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 17 | Previous object z position in global coordinates | -Inf | Inf | objGeom (derived) | - | - | position (m)| | ||
| 18 | Previous object x quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 19 | Previous object y quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 20 | Previous object z quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 21 | Previous object w quaternion component in global coordinates | -Inf | Inf | objGeom (derived) | - | - | quaternion | | ||
| 22 | Goal x position | -Inf | Inf | goal (derived) | - | - | position (m)| | ||
| 23 | Goal y position | -Inf | Inf | goal (derived) | - | - | position (m)| | ||
| 24 | Goal z position | -Inf | Inf | goal (derived) | - | - | position (m)| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters