Skip to content

Commit

Permalink
Update mid-way-recap.mdx
Browse files Browse the repository at this point in the history
Removed - "define the policy by hand" as it can be misleading and restructured the sentence establishing the focus on Value function and it's inherent nature to lead us to Optimal Policy
  • Loading branch information
imfoobar42 authored Mar 9, 2025
1 parent be21bbf commit bdb2c6e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion units/en/unit2/mid-way-recap.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ We have two types of value-based functions:

- State-value function: outputs the expected return if **the agent starts at a given state and acts according to the policy forever after.**
- Action-value function: outputs the expected return if **the agent starts in a given state, takes a given action at that state** and then acts accordingly to the policy forever after.
- In value-based methods, rather than learning the policy, **we define the policy by hand** and we learn a value function. If we have an optimal value function, we **will have an optimal policy.**
- In value-based methods, rather than learning the policy, **we focus on learning a value function**. An optimal value function, **will lead us to an optimal policy.**

There are two types of methods to update the value function:

Expand Down

0 comments on commit bdb2c6e

Please sign in to comment.