-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
ratio = tf.exp(pi.log_prob(action) - old_pi.log_prob(action))
surr = ratio * adv
...
loss = -tf.reduce_mean( tf.minimum(surr, tf.clip_by_value(ratio, 1. - self.epsilon, 1. + self.epsilon) * adv) )
should use ratio in tf.minimum rather than surr, because surr=ration*adv, and there could be negative value in adv, so the result of tf.minimum may contain a value like -1e10, and cause actor's loss failed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels