For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

dbsxdbsx · 2020-10-13T15:02:30Z

Currently, I am trying to merge models for SAC discrete and continous version into just 1 model.
According to SAC discrete critic_model, it only need input state and output action distribution. To make it as consistent with continous one, I modified it with input with both state and action, and only output q-value for the input q(s,a)---just like what happens in continuous version. Also, for the training part, now it is ok to use the same code, without considering action distributions when updating parameter. BUT the modified SAC for discrete actions just doesn't converge!

The code below is some of what I modified to let the discrete version to have similar behavior as that in continuous version,
but as it doesn't converge, I guess whethere it is something wrong with log_prob?

_dist = self.distribution(action_dist) #torch.distributions.Categorical
actions = _dist.sample() 
# modified version
actions = actions.unsqueeze(1)
self.log_prob = torch.log(actions + (actions == 0.0).float() * 1e-8)
# original version        
# z = (action_dist == 0.0).float() * 1e-8
# self.log_prob = torch.log(action_dist + z)
# actions = actions.unsqueeze(1)# add batch dim

I wonder whether it is possible to let SAC-discrete version to update the same way as in sac-continuous?
If it is possible, then it is happy to use almost the same code for both discrete and continuous version--- that is what I want.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

dbsxdbsx commented Oct 13, 2020 •

edited

Loading

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

Comments

dbsxdbsx commented Oct 13, 2020 • edited Loading

dbsxdbsx commented Oct 13, 2020 •

edited

Loading