Skip to content

Commit

Permalink
Update SB3 version
Browse files Browse the repository at this point in the history
  • Loading branch information
araffin committed Jan 12, 2024
1 parent 1e5be54 commit c51d537
Show file tree
Hide file tree
Showing 6 changed files with 33 additions and 32 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## Release 2.3.0a0 (WIP)
## Release 2.3.0a1 (WIP)

### Breaking Changes
- Updated defaults hyperparameters for TD3/DDPG to match SAC ones
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Upgraded to SB3 >= 2.3.0

### New Features

Expand Down
22 changes: 11 additions & 11 deletions hyperparams/ddpg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ MountainCarContinuous-v0:
noise_std: 0.5
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
learning_rate: !!float 1e-3
batch_size: 256
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

Pendulum-v1:
n_timesteps: 20000
Expand All @@ -20,8 +20,8 @@ Pendulum-v1:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

LunarLanderContinuous-v2:
n_timesteps: !!float 3e5
Expand All @@ -33,8 +33,8 @@ LunarLanderContinuous-v2:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

BipedalWalker-v3:
n_timesteps: !!float 1e6
Expand All @@ -46,8 +46,8 @@ BipedalWalker-v3:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

# To be tuned
BipedalWalkerHardcore-v3:
Expand All @@ -61,7 +61,7 @@ BipedalWalkerHardcore-v3:
batch_size: 256
train_freq: 1
learning_rate: lin_7e-4
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

# Tuned
HalfCheetahBulletEnv-v0: &pybullet-defaults
Expand Down Expand Up @@ -129,9 +129,9 @@ HalfCheetah-v4: &mujoco-defaults
noise_std: 0.1
train_freq: 1
gradient_steps: 1
learning_rate: !!float 3e-4
learning_rate: !!float 1e-3
batch_size: 256
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

Ant-v4:
<<: *mujoco-defaults
Expand Down
30 changes: 15 additions & 15 deletions hyperparams/td3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ MountainCarContinuous-v0:
noise_std: 0.5
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
learning_rate: !!float 1e-3
batch_size: 256
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

Pendulum-v1:
n_timesteps: 20000
Expand All @@ -20,8 +20,8 @@ Pendulum-v1:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

LunarLanderContinuous-v2:
n_timesteps: !!float 3e5
Expand All @@ -33,8 +33,8 @@ LunarLanderContinuous-v2:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

BipedalWalker-v3:
n_timesteps: !!float 1e6
Expand All @@ -46,8 +46,8 @@ BipedalWalker-v3:
noise_std: 0.1
gradient_steps: 1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

# To be tuned
BipedalWalkerHardcore-v3:
Expand All @@ -61,7 +61,7 @@ BipedalWalkerHardcore-v3:
batch_size: 256
train_freq: 1
learning_rate: lin_7e-4
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

# Tuned
HalfCheetahBulletEnv-v0: &pybullet-defaults
Expand Down Expand Up @@ -98,8 +98,8 @@ HumanoidBulletEnv-v0:
noise_type: 'normal'
noise_std: 0.1
train_freq: 1
learning_rate: !!float 3e-4
policy_kwargs: "dict(net_arch=[256, 256])"
learning_rate: !!float 1e-3
policy_kwargs: "dict(net_arch=[400, 300])"

# Tuned
ReacherBulletEnv-v0:
Expand All @@ -125,10 +125,10 @@ MinitaurBulletEnv-v0:
noise_std: 0.1
learning_starts: 10000
batch_size: 256
learning_rate: !!float 3e-4
learning_rate: !!float 1e-3
train_freq: 1
gradient_steps: 1
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

# === Mujoco Envs ===
HalfCheetah-v4: &mujoco-defaults
Expand All @@ -139,9 +139,9 @@ HalfCheetah-v4: &mujoco-defaults
noise_std: 0.1
train_freq: 1
gradient_steps: 1
learning_rate: !!float 3e-4
learning_rate: !!float 1e-3
batch_size: 256
policy_kwargs: "dict(net_arch=[256, 256])"
policy_kwargs: "dict(net_arch=[400, 300])"

Ant-v4:
<<: *mujoco-defaults
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
gym==0.26.2
stable-baselines3[extra_no_roms,tests,docs]>=2.2.1,<3.0
sb3-contrib>=2.2.1,<3.0
stable-baselines3[extra_no_roms,tests,docs]>=2.3.0a1,<3.0
sb3-contrib>=2.3.0a1,<3.0
box2d-py==2.3.8
pybullet
pybullet_envs_gymnasium>=0.4.0
Expand Down
2 changes: 1 addition & 1 deletion rl_zoo3/version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.3.0a0
2.3.0a1
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
},
entry_points={"console_scripts": ["rl_zoo3=rl_zoo3.cli:main"]},
install_requires=[
"sb3_contrib>=2.2.1,<3.0",
"sb3_contrib>=2.3.0a1,<3.0",
"gymnasium~=0.29.1",
"huggingface_sb3>=3.0,<4.0",
"tqdm",
Expand Down

0 comments on commit c51d537

Please sign in to comment.