-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Should the actor update not utilise idx[0] and idx[1] for Q1 and Q2? currently it just gets the same value of Q from the same critic
---------------------------- update actor ----------------------------
if step == self.G-1:
actions_pred, log_prob, _ = self.actor_local.sample(states)
# TODO: make this variable for possible more than two critics
Q1 = self.critics[idx[**0**]](states, actions_pred.squeeze(0)).cpu()
Q2 = self.critics[idx[**0**]](states, actions_pred.squeeze(0)).cpu()
Q = torch.min(Q1,Q2)
Metadata
Metadata
Assignees
Labels
No labels