Skip to content

Mismatch between trajectory and reward_extra_infos_dict when using _log_rollout_data #4232

@mirrorboat

Description

@mirrorboat

System Info

在RayPPOTrainer中,_balance_batch会打乱batch的顺序,似乎会导致执行_log_rollout_data 时reward_extra_infos_dict顺序和batch不一致。
注:读代码时发现的潜在问题,尚未尝试复现

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

同时开启balance_batch和rollout_data_dir

Expected behavior

reward_extra_infos_dict中的信息的顺序应该和batch中的轨迹一致

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions