Releases · kengz/SLM-Lab

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

REINFORCE
AC (Vanilla Actor-Critic)
- shared or separate actor critic networks
- plain TD
- entropy term control
A2C (Advantage Actor-Critic)
- extension of AC with with advantage function
- N-step returns as advantage
- GAE (Generalized Advantage Estimate) as advantage
PPO (Proximal Policy Optimization)
- extension of A3C with PPO loss function

Value-based:

SARSA
DQN (Deep Q Learning)
- boltzmann or epsilon-greedy policy
DRQN (Recurrent DQN)
Double DQN
Double DRQN
Multitask DQN (multi-environment DQN)
Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

OnPolicyReplay
OnPolicySeqReplay
OnPolicyBatchReplay
OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

Replay
SeqReplay
StackReplay
AtariReplay
PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

MLPNet (Multi Layer Perceptron)
MLPHeterogenousTails (multi-tails)
HydraMLPNet (multi-heads, multi-tails)
RecurrentNet
ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

different probability distributions for sampling actions
default policy
Boltzmann policy
Epsilon-greedy policy
numerous rate decay methods

Assets 2

16 May 15:37

kengz

v1.0.3

99f54b4

Atari, Dockerfile, PPO

New features and improvements

some code cleanup to prepare for the next version
DQN Atari working, not optimized yet
Dockerfile finished, ready to run lab at scale on server
implemented PPO in tensorflow from OpenAI, along with the utils

Assets 2

04 Mar 17:16

kengz

v1.0.2

6f03300

v1.0.2 Evolutionary Search

New features and improvements

add EvolutionarySearch for hyperparameter search
rewrite and simplify the underlying Ray logic
fix categorical error in a2c
improve experiment graph: wider, add opacity

Assets 2

17 Feb 02:19

kengz

v1.0.1

04e8048

v1.0.1: fitness, analysis, tune A2C and Reinforce

New features and improvements

improve fitness computation after usage
add retro analysis script, via yarn analyze <dir>
improve plotly renderings
improve CNN and RNN architectures, bring to Reinforce
fine tune A2C and Reinforce specs

Assets 2

04 Feb 23:09

kengz

v1.0.0

c4538fc

v1.0.0: First stable release with full lab features

This is the first stable release of the lab, with the core API and features finalized.

Refer to the docs:
Github Repo | Lab Documentation | Experiment Log Book

Features

All the crucial features of the lab are stable and tested:

baseline algorithms
OpenAI gym, Unity environments
modular reusable components
multi-agents, multi-environments
scalable hyperparameter search with ray
useful graphs and analytics
fitness vector for universal benchmarking of agents, environments

Baselines

The first release includes the following algorithms, with more to come later.

DQN
Double DQN
REINFORCE
- Option to add entropy to encourage exploration
Actor-Critic
- Batch or episodic training
- Shared or separate actor and critic params
- Advantage calculated using n-step returns or generalized advantage estimation
- Option to add entropy to encourage exploration

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementations

Bug Fixes

Uh oh!

Canonical Algorithms and Components

Algorithm

Memory

Neural Network

Policy

Uh oh!

New features and improvements

Uh oh!

New features and improvements

Uh oh!

New features and improvements

Uh oh!

Features

Baselines

Uh oh!

Releases: kengz/SLM-Lab

add SIL, fix PG loss bug, add dueling networks

Implementations

Bug Fixes

Uh oh!

v1.1.0 full roadmap algorithms and features

Canonical Algorithms and Components

Algorithm

Memory

Neural Network

Policy

Uh oh!

Atari, Dockerfile, PPO

New features and improvements

Uh oh!

v1.0.2 Evolutionary Search

New features and improvements

Uh oh!

v1.0.1: fitness, analysis, tune A2C and Reinforce

New features and improvements

Uh oh!

v1.0.0: First stable release with full lab features

Features

Baselines

Uh oh!