Open
Description
I am referring to the gradient derivation here.
The paragraph where the instructor claimed "we can approximate the likelihood ratio policy gradient with sample-based estimate" then term of P(τ;θ) (probability of trajectory τ given the parameters θ) disappeared in the subsequent summation. Why?
I asked the same question on the discord study-group (here) but got no response.
Metadata
Metadata
Assignees
Labels
No labels