Skip to content

Logging of episodic returns in ppo implementations #508

@shehper

Description

@shehper

In ppo.py and ppo_atari.py, episodic information is logged as follows:

            if "final_info" in infos:
                for info in infos["final_info"]:
                    if info and "episode" in info:
                        print(f"global_step={global_step}, episodic_return={info['episode']['r']}")
                        writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step)
                        writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)

If more than one episode truncate / terminate at the same global step, this code ends up logging only one of them as they are assigned to the same global_step. Is this the intended behavior?

To log them all, we could do something like

            if "final_info" in infos:
                for i, info in enumerate(infos["final_info"]):
                    if info and "episode" in info:
                        logging_step = global_step - args.num_envs + i
                        print(f"logging_step={logging_step}, episodic_return={info['episode']['r']}")
                        writer.add_scalar("charts/episodic_return", info["episode"]["r"], logging_step)
                        writer.add_scalar("charts/episodic_length", info["episode"]["l"], logging_step)

Alternatively, if we insist on not defining logging_step, we should log mean return, std dev of return, and the number of terminated / truncated at each global_step as to not bias logging in favor of one of many parallel environments.

Similar issues may also be present in other (PPO) files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions