Adding Token Counter for Online RL #110

rithwik-db · 2025-07-15T23:24:47Z

We are updating the token counter to either be defined by the action_mask + prompt_len or using the len(sequences) when pad token is removed. I tested it out on compose-rl-grpo-test-5w6ZMR and both return the same output.

The bug that we encounter comes from the fact that we use a StreamingDataLoader without a bespoke get_num_tokens_in_batch fn (like the one llmfoundry uses here). Therefore, we are using the default token counting function in Composer here. As such, since we don't have input_ids in the batch, we end up just using max_seq_len * num_samples_in_batch to count the total number of tokens, which leads to the incorrect value that we're seeing in our tests.

compose_rl/data/dataloader.py

bowenyang008 · 2025-07-16T00:04:47Z

Thanks @rithwik-db it looks good to me but I would like to have reviews from experts. Additionally I found it pretty useful to log:

[RICKY] prompt_tokens: 1701
[RICKY] generated_tokens: 463

per iteration or per minibatch in each iteration. If no objection from other reviewers I would prefer adding this to the log as you have already done.

bowenyang008 · 2025-07-16T00:29:13Z

also comparing your log and one of my log on main grpo-t2s-lr-1e-6-clip-5e-3-kl-1e-3-v7-9JJUW2

there is an order of magnitude diff
this branch:
Train throughput/tokens_per_sec: 12892.0938
main:
Train throughput/tokens_per_sec: 106238.8025
and the MFU using this branch is only 8%

bowenyang008 · 2025-07-16T00:33:01Z

I also benchmarked the dataset via this run: grpo-t2s-lr-1e-6-clip-5e-3-kl-1e-3-v7-xRNGNv and got this, which kind of explains the MFU issue, mean prompt + gen is likely in the range of 3000, while max_seq is close to 14K, so skewness in max and mean could result in significant padding or useless compute, e.g., if every batch is padded to nearly 14K, we would lose nearly 80% MFU, if we can recover this we will be back to the practical 40% MFU regime.

2025-07-16 07:11:16,638: rank0[823][MainThread]: INFO: compose_rl.algorithms.online.callback: number of prompts in full train_prompt_loader dataset: 9440
2025-07-16 07:11:16,644: rank0[823][MainThread]: INFO: compose_rl.algorithms.online.callback: global max prompt length in train dataset: 13776
2025-07-16 07:11:16,644: rank0[823][MainThread]: INFO: compose_rl.algorithms.online.callback: global mean prompt length in train dataset: 2595.5824152542373

rithwik-db · 2025-07-16T03:19:35Z

@bowenyang008, added the logs you mentioned here.

To answer:

so most the tokens are just padding?

Not exactly, the max amount of generated tokens is 2000 (which is what max_gen_len is) in the yaml above and so the amount of padding can't exceed that value. It's just that when we count tokens, we use Composer's counter which ends up just being max_seq_len * num_samples_in_batch which is defined here. Added this to the PR description as well.

compose_rl/data/dataloader.py

gupta-abhay

some comments

compose_rl/algorithms/online/model_methods.py

compose_rl/data/dataloader.py

gupta-abhay

lgtm!

rithwik-db requested review from bcui-db, dakinggg, gupta-abhay, abaheti95 and jdchang1 as code owners July 15, 2025 23:24

rithwik-db commented Jul 15, 2025

View reviewed changes

compose_rl/data/dataloader.py Outdated Show resolved Hide resolved

bowenyang008 reviewed Jul 16, 2025

View reviewed changes

compose_rl/data/dataloader.py Outdated Show resolved Hide resolved

rithwik-db requested a review from bowenyang008 July 16, 2025 20:06

rithwik-db commented Jul 17, 2025

View reviewed changes

compose_rl/data/dataloader.py Show resolved Hide resolved

gupta-abhay reviewed Jul 17, 2025

View reviewed changes

compose_rl/algorithms/online/model_methods.py Outdated Show resolved Hide resolved

compose_rl/data/dataloader.py Show resolved Hide resolved

rithwik-db requested a review from gupta-abhay July 17, 2025 18:23

gupta-abhay approved these changes Jul 17, 2025

View reviewed changes

rithwik-db added 14 commits July 17, 2025 22:56

adding token counter?

1b774c6

some more changes

0b0b63c

some more changes

09073d1

some more changes

49b5362

some more changes

511ad60

some more changes

b8fbedd

works

060e6a8

formatted

ef49e89

formatted changes and added logging

cf4bde9

logging tests

99c86e1

stupid mistake

3ee97c3

formatted plz work

9e0be0b

formatted

9fb0e62

formatting again plz work

0684508

rithwik-db added 6 commits July 17, 2025 22:56

formatted

07bb1b6

added base value of 100

290e8e0

small updates

3b49250

using value of 0

997f52b

test to see if this doesn't run into issues

0a0c7b3

added same fn to messages_dataloader

ce59995

rithwik-db force-pushed the token-counter branch from 78e8057 to eb71a3d Compare July 17, 2025 22:58

addressed comment

f325d3e

rithwik-db force-pushed the token-counter branch from eb71a3d to f325d3e Compare July 17, 2025 23:08

bowenyang008 approved these changes Jul 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Token Counter for Online RL #110

Adding Token Counter for Online RL #110

rithwik-db commented Jul 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

bowenyang008 commented Jul 16, 2025

Uh oh!

bowenyang008 commented Jul 16, 2025

Uh oh!

bowenyang008 commented Jul 16, 2025 •

edited

Loading

Uh oh!

rithwik-db commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

gupta-abhay left a comment

Uh oh!

Uh oh!

Uh oh!

gupta-abhay left a comment

Uh oh!

Uh oh!

Adding Token Counter for Online RL #110

Are you sure you want to change the base?

Adding Token Counter for Online RL #110

Conversation

rithwik-db commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bowenyang008 commented Jul 16, 2025

Uh oh!

bowenyang008 commented Jul 16, 2025

Uh oh!

bowenyang008 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rithwik-db commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

gupta-abhay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gupta-abhay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rithwik-db commented Jul 15, 2025 •

edited

Loading

bowenyang008 commented Jul 16, 2025 •

edited

Loading