Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Commit 7ad7e05

Browse files
committed
Initial commit
0 parents  commit 7ad7e05

37 files changed

+28980
-0
lines changed

.gitignore

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
__pycache__/
2+
.ipynb_checkpoints/
3+
exp_local
4+
exp
5+
exp_fixed
6+
exp_hard
7+
nbs
8+
code_snapshots
9+
exp_drqv2_*.py
10+
dmc_benchmarks.py
11+
check_sweep.py
12+
cancel_sweep.py

CODE_OF_CONDUCT.md

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Code of Conduct
2+
3+
## Our Pledge
4+
5+
In the interest of fostering an open and welcoming environment, we as
6+
contributors and maintainers pledge to make participation in our project and
7+
our community a harassment-free experience for everyone, regardless of age, body
8+
size, disability, ethnicity, sex characteristics, gender identity and expression,
9+
level of experience, education, socio-economic status, nationality, personal
10+
appearance, race, religion, or sexual identity and orientation.
11+
12+
## Our Standards
13+
14+
Examples of behavior that contributes to creating a positive environment
15+
include:
16+
17+
* Using welcoming and inclusive language
18+
* Being respectful of differing viewpoints and experiences
19+
* Gracefully accepting constructive criticism
20+
* Focusing on what is best for the community
21+
* Showing empathy towards other community members
22+
23+
Examples of unacceptable behavior by participants include:
24+
25+
* The use of sexualized language or imagery and unwelcome sexual attention or
26+
advances
27+
* Trolling, insulting/derogatory comments, and personal or political attacks
28+
* Public or private harassment
29+
* Publishing others' private information, such as a physical or electronic
30+
address, without explicit permission
31+
* Other conduct which could reasonably be considered inappropriate in a
32+
professional setting
33+
34+
## Our Responsibilities
35+
36+
Project maintainers are responsible for clarifying the standards of acceptable
37+
behavior and are expected to take appropriate and fair corrective action in
38+
response to any instances of unacceptable behavior.
39+
40+
Project maintainers have the right and responsibility to remove, edit, or
41+
reject comments, commits, code, wiki edits, issues, and other contributions
42+
that are not aligned to this Code of Conduct, or to ban temporarily or
43+
permanently any contributor for other behaviors that they deem inappropriate,
44+
threatening, offensive, or harmful.
45+
46+
## Scope
47+
48+
This Code of Conduct applies within all project spaces, and it also applies when
49+
an individual is representing the project or its community in public spaces.
50+
Examples of representing a project or community include using an official
51+
project e-mail address, posting via an official social media account, or acting
52+
as an appointed representative at an online or offline event. Representation of
53+
a project may be further defined and clarified by project maintainers.
54+
55+
This Code of Conduct also applies outside the project spaces when there is a
56+
reasonable belief that an individual's behavior may have a negative impact on
57+
the project or its community.
58+
59+
## Enforcement
60+
61+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
62+
reported by contacting the project team at <[email protected]>. All
63+
complaints will be reviewed and investigated and will result in a response that
64+
is deemed necessary and appropriate to the circumstances. The project team is
65+
obligated to maintain confidentiality with regard to the reporter of an incident.
66+
Further details of specific enforcement policies may be posted separately.
67+
68+
Project maintainers who do not follow or enforce the Code of Conduct in good
69+
faith may face temporary or permanent repercussions as determined by other
70+
members of the project's leadership.
71+
72+
## Attribution
73+
74+
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
75+
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
76+
77+
[homepage]: https://www.contributor-covenant.org
78+
79+
For answers to common questions about this code of conduct, see
80+
https://www.contributor-covenant.org/faq

CONTRIBUTING.md

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Contributing to DrQ-v2
2+
We want to make contributing to this project as easy and transparent as
3+
possible.
4+
5+
## Pull Requests
6+
We actively welcome your pull requests.
7+
8+
1. Fork the repo and create your branch from `main`.
9+
2. If you've added code that should be tested, add tests.
10+
3. If you've changed APIs, update the documentation.
11+
4. Ensure the test suite passes.
12+
5. Make sure your code lints.
13+
6. If you haven't already, complete the Contributor License Agreement ("CLA").
14+
15+
## Contributor License Agreement ("CLA")
16+
In order to accept your pull request, we need you to submit a CLA. You only need
17+
to do this once to work on any of Facebook's open source projects.
18+
19+
Complete your CLA here: <https://code.facebook.com/cla>
20+
21+
## Issues
22+
We use GitHub issues to track public bugs. Please ensure your description is
23+
clear and has sufficient instructions to be able to reproduce the issue.
24+
25+
Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
26+
disclosure of security bugs. In those cases, please go through the process
27+
outlined on that page and do not file a public issue.
28+
29+
30+
## License
31+
By contributing to DrQ-v2, you agree that your contributions will be licensed
32+
under the LICENSE file in the root directory of this source tree.

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) Facebook, Inc. and its affiliates.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
2+
3+
# DrQ-v2: Improved Data-Augmented RL Agent
4+
5+
<p align="center">
6+
<img width="19.5%" src="https://i.imgur.com/NzY7Pyv.gif">
7+
<img width="19.5%" src="https://imgur.com/O5Va3NY.gif">
8+
<img width="19.5%" src="https://imgur.com/PCOR9Mm.gif">
9+
<img width="19.5%" src="https://imgur.com/H0ab6tz.gif">
10+
<img width="19.5%" src="https://imgur.com/sDGgRos.gif">
11+
<img width="19.5%" src="https://imgur.com/gj3qo1X.gif">
12+
<img width="19.5%" src="https://imgur.com/FFzRwFt.gif">
13+
<img width="19.5%" src="https://imgur.com/W5BKyRL.gif">
14+
<img width="19.5%" src="https://imgur.com/qwOGfRQ.gif">
15+
<img width="19.5%" src="https://imgur.com/Uubf00R.gif">
16+
</p>
17+
18+
## Method
19+
DrQ-v2 is a model-free off-policy algorithm for image-based continuous control. DrQ-v2 builds on [DrQ](https://github.com/denisyarats/drq), an actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements including:
20+
- Switch the base RL learner from SAC to DDPG.
21+
- Incorporate n-step returns to estimate TD error.
22+
- Introduce a decaying schedule for exploration noise.
23+
- Make implementation 3.5 times faster.
24+
- Find better hyper-parameters.
25+
26+
<p align="center">
27+
<img src="https://i.imgur.com/SemY10G.png" width="100%"/>
28+
</p>
29+
30+
These changes allow us to significantly improve sample efficiency and wall-clock training time on a set of challening tasks from the [DeepMind Control Suite](https://github.com/deepmind/dm_control) compared to prior methods. Furthermore, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL.
31+
32+
<p align="center">
33+
<img width="100%" src="https://imgur.com/mrS4fFA.png">
34+
<img width="100%" src="https://imgur.com/pPd1ks6.png">
35+
</p>
36+
37+
## Citation
38+
39+
If you use this repo in your research, please consider citing the paper as follows:
40+
```
41+
@article{yarats2021drqv2,
42+
title={Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning},
43+
author={Denis Yarats and Rob Fergus and Alessandro Lazaric and Lerrel Pinto},
44+
journal={arXiv preprint arXiv:},
45+
year={2021}
46+
}
47+
```
48+
49+
## Instructions
50+
51+
Install dependencies:
52+
```sh
53+
conda env create -f conda_env.yml
54+
conda activate drqv2
55+
```
56+
57+
Train the agent:
58+
```sh
59+
python train.py task=quadruped_walk
60+
```
61+
62+
Monitor results:
63+
```sh
64+
tensorboard --logdir exp_local
65+
```
66+
67+
## License
68+
The majority of DrQ-v2 is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.

conda_env.yml

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: drqv2
2+
channels:
3+
- defaults
4+
dependencies:
5+
- python=3.8
6+
- pip=21.1.3
7+
- numpy=1.19.2
8+
- absl-py=0.13.0
9+
- pyparsing=2.4.7
10+
- jupyterlab=3.0.14
11+
- scikit-image=0.18.1
12+
- nvidia::cudatoolkit=11.1
13+
- pytorch::pytorch
14+
- pytorch::torchvision
15+
- pytorch::torchaudio
16+
- pip:
17+
- termcolor==1.1.0
18+
- git+git://github.com/deepmind/dm_control.git
19+
- tb-nightly
20+
- imageio==2.9.0
21+
- imageio-ffmpeg==0.4.4
22+
- hydra-core==1.1.0
23+
- hydra-submitit-launcher==1.1.5
24+
- pandas==1.3.0
25+
- ipdb==0.13.9
26+
- yapf==0.31.0
27+
- mujoco_py==2.0.2.13
28+
- sklearn==0.0
29+
- matplotlib==3.4.2
30+
- opencv-python==4.5.3.56

config.yaml

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
defaults:
2+
- override hydra/launcher: submitit_local
3+
4+
# task settings
5+
task: quadruped_walk
6+
frame_stack: 3
7+
action_repeat: 2
8+
discount: 0.99
9+
# train settings
10+
num_train_frames: 1000000
11+
num_seed_frames: 4000
12+
# eval
13+
eval_every_frames: 10000
14+
num_eval_episodes: 10
15+
# snapshot
16+
save_snapshot: false
17+
# replay buffer
18+
replay_buffer_size: 1000000
19+
replay_buffer_num_workers: 4
20+
nstep: 3
21+
batch_size: 256
22+
# misc
23+
seed: 1
24+
device: cuda
25+
save_video: true
26+
save_train_video: false
27+
use_tb: false
28+
# experiment
29+
experiment: exp
30+
31+
agent:
32+
_target_: drqv2.DrQV2Agent
33+
obs_shape: ??? # to be specified later
34+
action_shape: ??? # to be specified later
35+
device: ${device}
36+
lr: 1e-4
37+
critic_target_tau: 0.01
38+
update_every_steps: 2
39+
use_tb: ${use_tb}
40+
num_expl_steps: 2000
41+
hidden_dim: 1024
42+
feature_dim: 50
43+
stddev_schedule: 'linear(1.0,0.1,500000)'
44+
stddev_clip: 0.3
45+
46+
hydra:
47+
run:
48+
dir: ./exp_local/${now:%Y.%m.%d}/${now:%H%M%S}_${hydra.job.override_dirname}
49+
sweep:
50+
dir: ./exp/${now:%Y.%m.%d}/${now:%H%M}_${experiment}
51+
subdir: ${hydra.job.num}
52+
launcher:
53+
timeout_min: 4300
54+
cpus_per_task: 10
55+
gpus_per_node: 1
56+
tasks_per_node: 1
57+
mem_gb: 160
58+
nodes: 1
59+
submitit_folder: ./exp/${now:%Y.%m.%d}/${now:%H%M%S}_${experiment}/.slurm

0 commit comments

Comments
 (0)