-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi,
I came across your paper and had some doubts. My goal is to use your results and analysis to train discrete SAC for parallel minigrid environments.
In train_pql.py, you have variables like critic_unit_time, critic_update_times, sim_unit_time, and counter[0]['critic']. How do these variables relate to the beta_a_v and beta_p_v ratios?
Suppose you have 128 envs with replay buffer size of 1e6, beta_a_v = 8, and beta_p_v = 2. Do you do beta_a_v updates to the critic and beta_p_v updates to the policy every iteration step (i.e. in 1 iteration step, all 128 envs will be executed)?
Thanks,
kb
Metadata
Metadata
Assignees
Labels
No labels