-
Notifications
You must be signed in to change notification settings - Fork 809
Open
Description
In ppo.py
and ppo_atari.py
, episodic information is logged as follows:
if "final_info" in infos:
for info in infos["final_info"]:
if info and "episode" in info:
print(f"global_step={global_step}, episodic_return={info['episode']['r']}")
writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step)
writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)
If more than one episode truncate / terminate at the same global step, this code ends up logging only one of them as they are assigned to the same global_step
. Is this the intended behavior?
To log them all, we could do something like
if "final_info" in infos:
for i, info in enumerate(infos["final_info"]):
if info and "episode" in info:
logging_step = global_step - args.num_envs + i
print(f"logging_step={logging_step}, episodic_return={info['episode']['r']}")
writer.add_scalar("charts/episodic_return", info["episode"]["r"], logging_step)
writer.add_scalar("charts/episodic_length", info["episode"]["l"], logging_step)
Alternatively, if we insist on not defining logging_step
, we should log mean return, std dev of return, and the number of terminated / truncated at each global_step
as to not bias logging in favor of one of many parallel environments.
Similar issues may also be present in other (PPO) files.
Metadata
Metadata
Assignees
Labels
No labels