README

This repository contains reproduction of results in ICLR submission DORA. The paper aims to tackle deep exploration problem in reinforcement learning. It works by using two Q-networks, one to estimate value of a given state action pair, the other to quantify uncertainty (initialized to be uncertain, sort of like a soft counter). See report for detail of our replication. The main conclusion of our research is that the DORA baselines are too weak, but we do think the authors proposed a technically interesting solution to the exploration vs. exploitation problem.

1 Running the code

To reproduce the results in our reproduction workshop paper, please read the following setup

1.1 Function approximation

The code has multiple arguments that you can pass in (e.g., to use Dora or DQN, to use epsilon greedy or softmax, to render the environment or not, choose the environment to run). You can read about the options by running

python main.py -h

An example run using mountain car environment with the paper’s setting is

python main.py -m dora -a softmax -g mountain_car

To repeat code in parralel using the same setting, run

python run_parallel.py -m dora -a softmax -g mountain_car

by default, this repeat 10 runs of the same experiment

1.2 Tabular setting

Please refer to env/readme.txt

2 register the bridge environment

get your gym file location, call it gym/

import gym
import os
print(os.path.dirname(gym.__file__))

then add the following line to gym/envs/__init__.py

register(
    id="BridgeEnv-v0",
    entry_point="gym.envs.bridge.bridge:BridgeEnv",
)
register(
    id="BridgeLargeEnv-v0",
    entry_point="gym.envs.bridge.bridge:BridgeLargeEnv",
)

then make a directory in gym/envs/bridge and put env/bridge.py in that directory

To run the code

import gym
env_small = gym.make("BridgeEnv-v0")
env_large = gym.make("BridgeLargeEnv-v0")

3 Conclusion of our replication

We verified that DORA is working in the tabular setting. However, DORA’s experiments using function approximation put DQN into a disadvantageous position (not a fair comparison). We are able to adjust the setting to get much better result using DQN.

For replication of our setting, switch to branch openai (named by referencing openai setting), and execute

python main.py -m dqn -g mountain_car.py -l logs

Then logs/dqn_default.pkl caches the rewards of this run. You should be able to verify it worked by executing code in plot.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
env		env
lib		lib
utility		utility
.gitignore		.gitignore
EECS598_final_project.pdf		EECS598_final_project.pdf
README.org		README.org
analysis.py		analysis.py
main.py		main.py
plot.ipynb		plot.ipynb
run_parallel.py		run_parallel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

1 Running the code

1.1 Function approximation

1.2 Tabular setting

2 register the bridge environment

3 Conclusion of our replication

About

Releases

Packages

Languages

nathanwang000/deep_exploration_with_E_network

Folders and files

Latest commit

History

Repository files navigation

README

1 Running the code

1.1 Function approximation

1.2 Tabular setting

2 register the bridge environment

3 Conclusion of our replication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages