[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples?

I am referring to the gradient derivation [here](https://huggingface.co/learn/deep-rl-course/unit4/pg-theorem#optional-the-policy-gradient-theorem).

The paragraph where the instructor claimed "we can approximate the likelihood ratio policy gradient with sample-based estimate" then term of P(τ;θ) (probability of trajectory τ given the parameters θ) disappeared in the subsequent summation. Why?

I asked the same question on the discord study-group ([here](https://discord.com/channels/879548962464493619/971379033642266654/1209056045222334504)) but got no response.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] How P(τ;θ) disappeared while estimating the gradients using trajectory samples? #495

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions