Skip to content

Code for the paper "TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning", Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun and Thomas Brox, ICLR 2018

License

Notifications You must be signed in to change notification settings

dosovits/td-or-not-td

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ViZDoom controlled reinforcement learning tasks and asynchronous Qmc, n-step Q, and A3C implementations

Code for the paper TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning, Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun and Thomas Brox, ICLR 2018.

Dependencies:

  • Python3
  • TensorFlow (tested with version 1.0)
  • ViZDoom and its dependencies
  • pillow

We used ViZDoom version 1.1.1 in our experiments. Since ViZDoom version 1.1.5 uses different textures for health packs, it might slightly affect the performance.

Installation:

Enter the td-or-not-td directory and run:

pip3 install -e .

Algorithms:

The three available algorithms are:

  • Qmc on-policy asynchronous dueling Monte Carlo Q-learning with a finite horizon
  • Q on-policy asynchronous dueling n-step DDQN
  • a3c on-policy asynchronous advantage actor critic

Running:

Training

python3 td_or_not_td/alg/main.py -main_path PATH_TO_DATA_OUTPUT -algorithm Qmc -doom_lvl hg_normal

Evaluation

Evaluate all network snapshots created by the training:

python3 td_or_not_td/alg/main.py -eval -main_path PATH_TO_DATA_OUTPUT -algorithm Qmc -doom_lvl hg_normal

As long as main_path exists, the evaluation can begin in parallel or before the training. In those cases the script will wait until the network snapshots appear. The evaluation results are stored in the evaluation_log.txt file at main_path. The first column shows the snapshot id, the second the training reward and the third the evaluation reward.


To see all available arguments run:

python3 td_or_not_td/alg/main.py -h

Tasks:

The tasks are selected by the doom_lvl argument. The tasks can be tested by running:

python3 td_or_not_td/env/env_doom.py [<doom_lvl>]

List of all doom_lvl names:

  • Basic health gathering:

    • hg_normal
  • Basic health gathering with health-based reward:

    • hg_normal_health_reward
  • Reward Sparsity:

    • hg_normal
    • hg_sparse
    • hg_very_sparse
  • Reward Delay:

    • hg_normal
    • hg_delay_2
    • hg_delay_4
    • hg_delay_8
    • hg_delay_16
    • hg_delay_32
  • Terminal health packs:

    • hg_terminal_health_m_1
    • hg_terminal_health_m_2
    • hg_terminal_health_m_3
  • Basic health gathering with multiple textures:

    • hg_normal_many_textures
  • Additional Tasks:

    • navigation
    • battle
    • battle_2

The Battle tasks are based on the environments from Learning to Act by Predicting the Future by Alexey Dosovitskiy and Vladlen Koltun, ICLR 2017.

If you use the new ViZDoom tasks or the algorithm implementations in your research, please cite the following paper:

@InProceedings{adkb2018td,
    author    = {Amiranashvili, Artemij and Dosovitskiy, Alexey and Koltun, Vladlen and Brox, Thomas},
    title     = {TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning},
    booktitle = "International Conference on Learning Representations (ICLR)",
    year      = {2018},
    url       = "https://lmb.informatik.uni-freiburg.de/projects/tdornottd/"
}

About

Code for the paper "TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning", Artemij Amiranashvili, Alexey Dosovitskiy, Vladlen Koltun and Thomas Brox, ICLR 2018

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%