When do critic and actor updates take place?

Hi,

I came across your paper and had some doubts. My goal is to use your results and analysis to train discrete SAC for parallel minigrid environments.

In `train_pql.py`, you have variables like critic_unit_time, critic_update_times, sim_unit_time, and counter[0]['critic']. How do these variables relate to the beta_a_v and beta_p_v ratios?

Suppose you have 128 envs with replay buffer size of 1e6, beta_a_v = 8, and beta_p_v = 2. Do you do beta_a_v updates to the critic and beta_p_v updates to the policy every iteration step (i.e. in 1 iteration step, all 128 envs will be executed)? 

Thanks,
kb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When do critic and actor updates take place? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When do critic and actor updates take place? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions