|
378 | 378 | "source": [
|
379 | 379 | "## Second example: normalize actions\n",
|
380 | 380 | "\n",
|
381 |
| - "It is usually a good idea to normalize observations and actions before giving it to the agent, this prevent [hard to debug issue](https://github.com/hill-a/stable-baselines/issues/473).\n", |
| 381 | + "It is usually a good idea to normalize observations and actions before giving it to the agent, this prevents this [hard to debug issue](https://github.com/hill-a/stable-baselines/issues/473).\n", |
382 | 382 | "\n",
|
383 | 383 | "In this example, we are going to normalize the action space of *Pendulum-v1* so it lies in [-1, 1] instead of [-2, 2].\n",
|
384 | 384 | "\n",
|
|
425 | 425 | " \"\"\"\n",
|
426 | 426 | " Reset the environment \n",
|
427 | 427 | " \"\"\"\n",
|
428 |
| - " # Reset the counter\n", |
429 | 428 | " return self.env.reset()\n",
|
430 | 429 | "\n",
|
431 | 430 | " def step(self, action):\n",
|
|
505 | 504 | "source": [
|
506 | 505 | "#### Test with a RL algorithm\n",
|
507 | 506 | "\n",
|
508 |
| - "We are going to use the Monitor wrapper of stable baselines, wich allow to monitor training stats (mean episode reward, mean episode length)" |
| 507 | + "We are going to use the Monitor wrapper of stable baselines, which allow to monitor training stats (mean episode reward, mean episode length)" |
509 | 508 | ]
|
510 | 509 | },
|
511 | 510 | {
|
|
610 | 609 | "source": [
|
611 | 610 | "## Additional wrappers: VecEnvWrappers\n",
|
612 | 611 | "\n",
|
613 |
| - "In the same vein as gym wrappers, stable baselines provide wrappers for `VecEnv`. Among the different that exist (and you can create your own), you should know: \n", |
| 612 | + "In the same vein as gym wrappers, stable baselines provide wrappers for `VecEnv`. Among the different wrappers that exist (and you can create your own), you should know: \n", |
614 | 613 | "\n",
|
615 | 614 | "- VecNormalize: it computes a running mean and standard deviation to normalize observation and returns\n",
|
616 | 615 | "- VecFrameStack: it stacks several consecutive observations (useful to integrate time in the observation, e.g. sucessive frame of an atari game)\n",
|
|
760 | 759 | "\n",
|
761 | 760 | "# Reset the environment\n",
|
762 | 761 | "\n",
|
763 |
| - "# Take random actions in the enviromnent and check\n", |
| 762 | + "# Take random actions in the environment and check\n", |
764 | 763 | "# that it returns the correct values after the end of each episode\n",
|
765 | 764 | "\n",
|
766 | 765 | "# ====================== #"
|
|
851 | 850 | " time_feature = 1 - (self._current_step / self._max_steps)\n",
|
852 | 851 | " if self._test_mode:\n",
|
853 | 852 | " time_feature = 1.0\n",
|
854 |
| - " # Optionnaly: concatenate [time_feature, time_feature ** 2]\n", |
| 853 | + " # Optionally: concatenate [time_feature, time_feature ** 2]\n", |
855 | 854 | " return np.concatenate((obs, [time_feature]))"
|
856 | 855 | ],
|
857 | 856 | "execution_count": 0,
|
|
0 commit comments