Mismatch between trajectory and reward_extra_infos_dict when using _log_rollout_data

### System Info

在RayPPOTrainer中，_balance_batch会打乱batch的顺序，似乎会导致执行_log_rollout_data 时reward_extra_infos_dict顺序和batch不一致。
注：读代码时发现的潜在问题，尚未尝试复现

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

同时开启balance_batch和rollout_data_dir

### Expected behavior

reward_extra_infos_dict中的信息的顺序应该和batch中的轨迹一致

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mismatch between trajectory and reward_extra_infos_dict when using _log_rollout_data #4232

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mismatch between trajectory and reward_extra_infos_dict when using _log_rollout_data #4232

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions