You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a problem on the computation of the reward. In the compute_reward function in videogpt_reward_model.py, for each transition $(s_t,a_t,s_{t+1})$, it seems that the variables image_batch, encodings, and embeddings correspond to $s_t$. Then it seems the reward $r_t(s_t,a_t)$ is computed as $log p({s_t|s_{1:t-1}})$ (when reward_model_compute_joint is set to False) and the sum from $log p({s_t|s_{1:t-1}})$ to $log p({s_{t-seqlen+1}|s_{1:t-seqlen}})$ (when reward_model_compute_joint is set to True), instead of the $log p(s_{t+1}|s_{1:t})$ stated in the paper. Do I miss any details that fix this issue, or is this exactly the empirical implementation of VIPER? Thank you!
The text was updated successfully, but these errors were encountered:
Hi, thank you for releasing the code.
I have a problem on the computation of the reward. In the$(s_t,a_t,s_{t+1})$ , it seems that the variables $s_t$ . Then it seems the reward $r_t(s_t,a_t)$ is computed as $log p({s_t|s_{1:t-1}})$ (when $log p({s_t|s_{1:t-1}})$ to $log p({s_{t-seqlen+1}|s_{1:t-seqlen}})$ (when $log p(s_{t+1}|s_{1:t})$ stated in the paper. Do I miss any details that fix this issue, or is this exactly the empirical implementation of VIPER? Thank you!
compute_reward
function invideogpt_reward_model.py
, for each transitionimage_batch
,encodings
, andembeddings
correspond toreward_model_compute_joint
is set toFalse
) and the sum fromreward_model_compute_joint
is set toTrue
), instead of theThe text was updated successfully, but these errors were encountered: