diff --git a/Practical_4_Reinforcement_Learning.ipynb b/Practical_4_Reinforcement_Learning.ipynb index 8e76493..88a7188 100644 --- a/Practical_4_Reinforcement_Learning.ipynb +++ b/Practical_4_Reinforcement_Learning.ipynb @@ -795,7 +795,7 @@ " \n", " def end_episode(self, final_reward): \n", " \"\"\"At the end of an episode, we compute the loss for the episode and take a \n", - " step in parameter speace in the direction of the gradients.\"\"\"\n", + " step in parameter space in the direction of the gradients.\"\"\"\n", " \n", " # Compute the return (cumulative discounted reward) for the episode\n", " episode_return = sum(self._rewards) + final_reward # Assuming \\gamma = 1\n", @@ -831,7 +831,7 @@ }, "cell_type": "markdown", "source": [ - "Notice that during the episode we run only the forward-pass of the policy network (inference). At the end of the episode, we replay the states that occured during the episode and run both the forward and backward pass of the policy network (notice the gradient tape!) because we can only compute the loss once we have the episode return at the end of the episode. If the policy network is very complex, this could be inefficient. In that case you could run both the forward an backward pass during the episode and store intermediate gradients/partial derivatives to use in the update at the end of the episode." + "Notice that during the episode we run only the forward-pass of the policy network (inference). At the end of the episode, we replay the states that occured during the episode and run both the forward and backward pass of the policy network (notice the gradient tape!) because we can only compute the loss once we have the episode return at the end of the episode. If the policy network is very complex, this could be inefficient. In that case you could run both the forward and backward pass during the episode and store intermediate gradients/partial derivatives to use in the update at the end of the episode." ] }, { @@ -934,4 +934,4 @@ "outputs": [] } ] -} \ No newline at end of file +}