Skip to content

Releases: kengz/SLM-Lab

add SIL, fix PG loss bug, add dueling networks

28 Jun 07:46
fbf482e
Compare
Choose a tag to compare

This release adds some new implementations, and fixes some bugs from first benchmark runs.

Implementations

#127 Self-Imitation Learning
#128 Checkpointing for saving models
#129 Dueling Networks

Bug Fixes

#132 GPU test-run fixes
#133 fix ActorCritic family loss compute getting detached, and linux plotting issues, add SHA to generated specs

v1.1.0 full roadmap algorithms and features

19 Jun 03:24
Compare
Choose a tag to compare

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

  • REINFORCE
  • AC (Vanilla Actor-Critic)
    • shared or separate actor critic networks
    • plain TD
    • entropy term control
  • A2C (Advantage Actor-Critic)
    • extension of AC with with advantage function
    • N-step returns as advantage
    • GAE (Generalized Advantage Estimate) as advantage
  • PPO (Proximal Policy Optimization)
    • extension of A3C with PPO loss function

Value-based:

  • SARSA
  • DQN (Deep Q Learning)
    • boltzmann or epsilon-greedy policy
  • DRQN (Recurrent DQN)
  • Double DQN
  • Double DRQN
  • Multitask DQN (multi-environment DQN)
  • Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

  • OnPolicyReplay
  • OnPolicySeqReplay
  • OnPolicyBatchReplay
  • OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

  • Replay
  • SeqReplay
  • StackReplay
  • AtariReplay
  • PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

  • MLPNet (Multi Layer Perceptron)
  • MLPHeterogenousTails (multi-tails)
  • HydraMLPNet (multi-heads, multi-tails)
  • RecurrentNet
  • ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

  • different probability distributions for sampling actions
  • default policy
  • Boltzmann policy
  • Epsilon-greedy policy
  • numerous rate decay methods

Atari, Dockerfile, PPO

16 May 15:37
99f54b4
Compare
Choose a tag to compare

New features and improvements

  • some code cleanup to prepare for the next version
  • DQN Atari working, not optimized yet
  • Dockerfile finished, ready to run lab at scale on server
  • implemented PPO in tensorflow from OpenAI, along with the utils

v1.0.2 Evolutionary Search

04 Mar 17:16
6f03300
Compare
Choose a tag to compare

New features and improvements

  • add EvolutionarySearch for hyperparameter search
  • rewrite and simplify the underlying Ray logic
  • fix categorical error in a2c
  • improve experiment graph: wider, add opacity

v1.0.1: fitness, analysis, tune A2C and Reinforce

17 Feb 02:19
04e8048
Compare
Choose a tag to compare

New features and improvements

  • improve fitness computation after usage
  • add retro analysis script, via yarn analyze <dir>
  • improve plotly renderings
  • improve CNN and RNN architectures, bring to Reinforce
  • fine tune A2C and Reinforce specs

v1.0.0: First stable release with full lab features

04 Feb 23:09
c4538fc
Compare
Choose a tag to compare

This is the first stable release of the lab, with the core API and features finalized.

Refer to the docs:
Github Repo | Lab Documentation | Experiment Log Book

Features

All the crucial features of the lab are stable and tested:

  • baseline algorithms
  • OpenAI gym, Unity environments
  • modular reusable components
  • multi-agents, multi-environments
  • scalable hyperparameter search with ray
  • useful graphs and analytics
  • fitness vector for universal benchmarking of agents, environments

Baselines

The first release includes the following algorithms, with more to come later.

  • DQN
  • Double DQN
  • REINFORCE
    • Option to add entropy to encourage exploration
  • Actor-Critic
    • Batch or episodic training
    • Shared or separate actor and critic params
    • Advantage calculated using n-step returns or generalized advantage estimation
    • Option to add entropy to encourage exploration