Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial] LLM integration #2832

Open
wants to merge 18 commits into
base: gh/vmoens/105/base
Choose a base branch
from
Open

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 5, 2025

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2832

Note: Links to docs will display an error until the docs builds have been completed.

❌ 19 New Failures, 2 Cancelled Jobs, 1 Unrelated Failure

As of commit e019e25 with merge base 27d3680 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Mar 5, 2025
ghstack-source-id: b6090530ad979d1965e2fdd52b5803b11606b8cb
Pull Request resolved: #2832
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025
vmoens added a commit that referenced this pull request Mar 6, 2025
ghstack-source-id: b6090530ad979d1965e2fdd52b5803b11606b8cb
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 6, 2025
ghstack-source-id: c354a96aa1cbc0e09525665a0460f49ce0f33cb7
Pull Request resolved: #2832
vmoens added a commit that referenced this pull request Mar 6, 2025
ghstack-source-id: c354a96aa1cbc0e09525665a0460f49ce0f33cb7
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 6, 2025
ghstack-source-id: b1b4ac58e4f9869b9a8aa63346be93ee841bd98b
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 10, 2025
ghstack-source-id: efe044843940cd7dffa4a685eb4cd2b53e07462a
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 6afe56e46143c7935781fe8d86ae02539181fdbc
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 51aa49667d5e9fa745660b28e569903a28f0c47e
Pull Request resolved: #2832
Copy link

github-actions bot commented Mar 11, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.6256s 0.5288s 1.8912 Ops/s 1.9392 Ops/s $\color{#d91a1a}-2.48\%$
test_transformed 1.1186s 1.0258s 0.9749 Ops/s 0.9726 Ops/s $\color{#35bf28}+0.24\%$
test_serial 1.5132s 1.5093s 0.6626 Ops/s 0.6583 Ops/s $\color{#35bf28}+0.64\%$
test_parallel 1.3018s 1.2974s 0.7708 Ops/s 0.7568 Ops/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-True-True-True-True] 0.4243ms 29.5971μs 33.7871 KOps/s 33.3754 KOps/s $\color{#35bf28}+1.23\%$
test_step_mdp_speed[True-True-True-True-False] 80.4100μs 17.6656μs 56.6070 KOps/s 56.7894 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-True-True-False-True] 45.3250μs 16.7446μs 59.7208 KOps/s 59.1619 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[True-True-True-False-False] 72.4450μs 9.8994μs 101.0158 KOps/s 99.6502 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[True-True-False-True-True] 61.3150μs 31.5505μs 31.6952 KOps/s 30.7474 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[True-True-False-True-False] 69.8100μs 19.3225μs 51.7532 KOps/s 50.4070 KOps/s $\color{#35bf28}+2.67\%$
test_step_mdp_speed[True-True-False-False-True] 49.7130μs 18.5442μs 53.9254 KOps/s 52.6756 KOps/s $\color{#35bf28}+2.37\%$
test_step_mdp_speed[True-True-False-False-False] 63.7900μs 11.6311μs 85.9767 KOps/s 83.0377 KOps/s $\color{#35bf28}+3.54\%$
test_step_mdp_speed[True-False-True-True-True] 0.1064ms 33.3027μs 30.0276 KOps/s 29.3757 KOps/s $\color{#35bf28}+2.22\%$
test_step_mdp_speed[True-False-True-True-False] 82.4330μs 21.0409μs 47.5265 KOps/s 45.8245 KOps/s $\color{#35bf28}+3.71\%$
test_step_mdp_speed[True-False-True-False-True] 64.8410μs 18.5898μs 53.7929 KOps/s 52.8864 KOps/s $\color{#35bf28}+1.71\%$
test_step_mdp_speed[True-False-True-False-False] 36.2770μs 11.6483μs 85.8496 KOps/s 83.7136 KOps/s $\color{#35bf28}+2.55\%$
test_step_mdp_speed[True-False-False-True-True] 94.0660μs 35.0178μs 28.5569 KOps/s 28.0199 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[True-False-False-True-False] 64.9510μs 22.9332μs 43.6050 KOps/s 42.5882 KOps/s $\color{#35bf28}+2.39\%$
test_step_mdp_speed[True-False-False-False-True] 71.5940μs 20.1078μs 49.7319 KOps/s 48.4301 KOps/s $\color{#35bf28}+2.69\%$
test_step_mdp_speed[True-False-False-False-False] 0.5963ms 13.2502μs 75.4707 KOps/s 72.8325 KOps/s $\color{#35bf28}+3.62\%$
test_step_mdp_speed[False-True-True-True-True] 91.2830μs 33.5847μs 29.7755 KOps/s 29.2575 KOps/s $\color{#35bf28}+1.77\%$
test_step_mdp_speed[False-True-True-True-False] 91.0410μs 21.2786μs 46.9955 KOps/s 46.5995 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[False-True-True-False-True] 2.2468ms 21.5351μs 46.4359 KOps/s 46.3320 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[False-True-True-False-False] 40.0150μs 13.0458μs 76.6532 KOps/s 74.9808 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[False-True-False-True-True] 85.1600μs 35.3040μs 28.3254 KOps/s 27.8805 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[False-True-False-True-False] 0.1328ms 23.8401μs 41.9462 KOps/s 42.2803 KOps/s $\color{#d91a1a}-0.79\%$
test_step_mdp_speed[False-True-False-False-True] 47.7290μs 23.0797μs 43.3282 KOps/s 42.5809 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[False-True-False-False-False] 38.9730μs 14.7663μs 67.7216 KOps/s 65.7720 KOps/s $\color{#35bf28}+2.96\%$
test_step_mdp_speed[False-False-True-True-True] 93.7250μs 37.0172μs 27.0145 KOps/s 26.4105 KOps/s $\color{#35bf28}+2.29\%$
test_step_mdp_speed[False-False-True-True-False] 54.3320μs 24.7651μs 40.3793 KOps/s 38.8427 KOps/s $\color{#35bf28}+3.96\%$
test_step_mdp_speed[False-False-True-False-True] 78.5070μs 23.0187μs 43.4429 KOps/s 41.5411 KOps/s $\color{#35bf28}+4.58\%$
test_step_mdp_speed[False-False-True-False-False] 65.1920μs 14.7687μs 67.7110 KOps/s 66.5423 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[False-False-False-True-True] 0.1208ms 38.3495μs 26.0759 KOps/s 24.4769 KOps/s $\textbf{\color{#35bf28}+6.53\%}$
test_step_mdp_speed[False-False-False-True-False] 79.6090μs 26.1853μs 38.1894 KOps/s 36.7870 KOps/s $\color{#35bf28}+3.81\%$
test_step_mdp_speed[False-False-False-False-True] 0.1448ms 24.5789μs 40.6853 KOps/s 39.5246 KOps/s $\color{#35bf28}+2.94\%$
test_step_mdp_speed[False-False-False-False-False] 63.3980μs 16.2059μs 61.7058 KOps/s 58.9687 KOps/s $\color{#35bf28}+4.64\%$
test_values[generalized_advantage_estimate-True-True] 14.3967ms 10.0834ms 99.1732 Ops/s 101.0991 Ops/s $\color{#d91a1a}-1.91\%$
test_values[vec_generalized_advantage_estimate-True-True] 28.5023ms 24.9208ms 40.1271 Ops/s 40.8182 Ops/s $\color{#d91a1a}-1.69\%$
test_values[td0_return_estimate-False-False] 0.2248ms 0.1755ms 5.6989 KOps/s 5.5841 KOps/s $\color{#35bf28}+2.06\%$
test_values[td1_return_estimate-False-False] 27.7100ms 25.0131ms 39.9791 Ops/s 39.6510 Ops/s $\color{#35bf28}+0.83\%$
test_values[vec_td1_return_estimate-False-False] 28.8841ms 24.8880ms 40.1800 Ops/s 41.0845 Ops/s $\color{#d91a1a}-2.20\%$
test_values[td_lambda_return_estimate-True-False] 36.0261ms 35.3752ms 28.2684 Ops/s 28.4839 Ops/s $\color{#d91a1a}-0.76\%$
test_values[vec_td_lambda_return_estimate-True-False] 26.8356ms 24.7192ms 40.4544 Ops/s 40.9387 Ops/s $\color{#d91a1a}-1.18\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.7554ms 8.5508ms 116.9477 Ops/s 117.3231 Ops/s $\color{#d91a1a}-0.32\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4434ms 1.9437ms 514.4846 Ops/s 503.9644 Ops/s $\color{#35bf28}+2.09\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1.5616ms 0.3731ms 2.6801 KOps/s 2.6538 KOps/s $\color{#35bf28}+0.99\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.3556ms 43.0265ms 23.2415 Ops/s 23.0826 Ops/s $\color{#35bf28}+0.69\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.2555ms 3.4569ms 289.2732 Ops/s 283.8572 Ops/s $\color{#35bf28}+1.91\%$
test_dqn_speed[False-None] 6.4537ms 1.4056ms 711.4607 Ops/s 706.9900 Ops/s $\color{#35bf28}+0.63\%$
test_dqn_speed[False-backward] 2.0781ms 1.9193ms 521.0313 Ops/s 530.2117 Ops/s $\color{#d91a1a}-1.73\%$
test_dqn_speed[True-None] 0.7246ms 0.5586ms 1.7901 KOps/s 1.7505 KOps/s $\color{#35bf28}+2.26\%$
test_dqn_speed[True-backward] 1.0150ms 0.9713ms 1.0296 KOps/s 709.5874 Ops/s $\textbf{\color{#35bf28}+45.09\%}$
test_dqn_speed[reduce-overhead-None] 0.6545ms 0.5609ms 1.7829 KOps/s 1.7282 KOps/s $\color{#35bf28}+3.17\%$
test_dqn_speed[reduce-overhead-backward] 1.1113ms 0.9921ms 1.0080 KOps/s 1.0095 KOps/s $\color{#d91a1a}-0.15\%$
test_ddpg_speed[False-None] 3.7417ms 2.8768ms 347.6127 Ops/s 338.6158 Ops/s $\color{#35bf28}+2.66\%$
test_ddpg_speed[False-backward] 4.1652ms 4.0028ms 249.8258 Ops/s 246.1975 Ops/s $\color{#35bf28}+1.47\%$
test_ddpg_speed[True-None] 1.9213ms 1.4374ms 695.7127 Ops/s 678.5120 Ops/s $\color{#35bf28}+2.54\%$
test_ddpg_speed[True-backward] 2.3571ms 2.3141ms 432.1418 Ops/s 403.7110 Ops/s $\textbf{\color{#35bf28}+7.04\%}$
test_ddpg_speed[reduce-overhead-None] 1.6313ms 1.4263ms 701.0990 Ops/s 688.0582 Ops/s $\color{#35bf28}+1.90\%$
test_ddpg_speed[reduce-overhead-backward] 2.5603ms 2.3655ms 422.7422 Ops/s 426.8446 Ops/s $\color{#d91a1a}-0.96\%$
test_sac_speed[False-None] 9.2601ms 7.9819ms 125.2832 Ops/s 123.8634 Ops/s $\color{#35bf28}+1.15\%$
test_sac_speed[False-backward] 12.3673ms 10.6832ms 93.6053 Ops/s 91.1743 Ops/s $\color{#35bf28}+2.67\%$
test_sac_speed[True-None] 3.7209ms 2.5868ms 386.5719 Ops/s 370.1315 Ops/s $\color{#35bf28}+4.44\%$
test_sac_speed[True-backward] 4.3859ms 4.2386ms 235.9259 Ops/s 219.7123 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_sac_speed[reduce-overhead-None] 3.3893ms 2.5873ms 386.5088 Ops/s 381.4955 Ops/s $\color{#35bf28}+1.31\%$
test_sac_speed[reduce-overhead-backward] 5.5734ms 4.6868ms 213.3655 Ops/s 230.6139 Ops/s $\textbf{\color{#d91a1a}-7.48\%}$
test_redq_speed[False-None] 15.8373ms 13.6951ms 73.0190 Ops/s 75.6552 Ops/s $\color{#d91a1a}-3.48\%$
test_redq_speed[False-backward] 27.0987ms 23.4141ms 42.7093 Ops/s 43.8288 Ops/s $\color{#d91a1a}-2.55\%$
test_redq_speed[True-None] 8.2042ms 7.3564ms 135.9366 Ops/s 131.4643 Ops/s $\color{#35bf28}+3.40\%$
test_redq_speed[True-backward] 16.0858ms 14.9068ms 67.0837 Ops/s 66.9438 Ops/s $\color{#35bf28}+0.21\%$
test_redq_speed[reduce-overhead-None] 8.9077ms 7.2488ms 137.9530 Ops/s 142.9467 Ops/s $\color{#d91a1a}-3.49\%$
test_redq_speed[reduce-overhead-backward] 16.5203ms 15.0965ms 66.2406 Ops/s 68.7592 Ops/s $\color{#d91a1a}-3.66\%$
test_redq_deprec_speed[False-None] 14.4449ms 13.2478ms 75.4842 Ops/s 74.8820 Ops/s $\color{#35bf28}+0.80\%$
test_redq_deprec_speed[False-backward] 20.7554ms 19.3890ms 51.5756 Ops/s 51.6205 Ops/s $\color{#d91a1a}-0.09\%$
test_redq_deprec_speed[True-None] 5.9491ms 5.2078ms 192.0181 Ops/s 186.8758 Ops/s $\color{#35bf28}+2.75\%$
test_redq_deprec_speed[True-backward] 12.2764ms 10.4243ms 95.9297 Ops/s 95.7382 Ops/s $\color{#35bf28}+0.20\%$
test_redq_deprec_speed[reduce-overhead-None] 6.3975ms 5.3908ms 185.5028 Ops/s 181.1453 Ops/s $\color{#35bf28}+2.41\%$
test_redq_deprec_speed[reduce-overhead-backward] 11.0397ms 10.4272ms 95.9034 Ops/s 89.8444 Ops/s $\textbf{\color{#35bf28}+6.74\%}$
test_td3_speed[False-None] 8.6642ms 8.1596ms 122.5557 Ops/s 115.3317 Ops/s $\textbf{\color{#35bf28}+6.26\%}$
test_td3_speed[False-backward] 11.5850ms 10.8264ms 92.3669 Ops/s 88.1611 Ops/s $\color{#35bf28}+4.77\%$
test_td3_speed[True-None] 2.9089ms 2.3782ms 420.4807 Ops/s 417.3540 Ops/s $\color{#35bf28}+0.75\%$
test_td3_speed[True-backward] 4.8379ms 4.5355ms 220.4834 Ops/s 227.6169 Ops/s $\color{#d91a1a}-3.13\%$
test_td3_speed[reduce-overhead-None] 2.7348ms 2.3113ms 432.6481 Ops/s 420.6859 Ops/s $\color{#35bf28}+2.84\%$
test_td3_speed[reduce-overhead-backward] 4.6315ms 4.3546ms 229.6447 Ops/s 241.7456 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_cql_speed[False-None] 39.9378ms 37.8022ms 26.4535 Ops/s 27.1053 Ops/s $\color{#d91a1a}-2.40\%$
test_cql_speed[False-backward] 52.2866ms 48.7959ms 20.4935 Ops/s 21.1288 Ops/s $\color{#d91a1a}-3.01\%$
test_cql_speed[True-None] 23.9700ms 22.8902ms 43.6869 Ops/s 44.1481 Ops/s $\color{#d91a1a}-1.04\%$
test_cql_speed[True-backward] 31.2198ms 30.1064ms 33.2155 Ops/s 33.1293 Ops/s $\color{#35bf28}+0.26\%$
test_cql_speed[reduce-overhead-None] 24.4098ms 22.7202ms 44.0137 Ops/s 43.4849 Ops/s $\color{#35bf28}+1.22\%$
test_cql_speed[reduce-overhead-backward] 31.6363ms 30.1299ms 33.1897 Ops/s 32.9449 Ops/s $\color{#35bf28}+0.74\%$
test_a2c_speed[False-None] 9.6886ms 7.5630ms 132.2220 Ops/s 131.3522 Ops/s $\color{#35bf28}+0.66\%$
test_a2c_speed[False-backward] 18.5042ms 15.4144ms 64.8743 Ops/s 65.9245 Ops/s $\color{#d91a1a}-1.59\%$
test_a2c_speed[True-None] 5.3224ms 4.8463ms 206.3436 Ops/s 205.4463 Ops/s $\color{#35bf28}+0.44\%$
test_a2c_speed[True-backward] 12.8536ms 11.5529ms 86.5585 Ops/s 88.2862 Ops/s $\color{#d91a1a}-1.96\%$
test_a2c_speed[reduce-overhead-None] 6.1863ms 5.1611ms 193.7560 Ops/s 207.0729 Ops/s $\textbf{\color{#d91a1a}-6.43\%}$
test_a2c_speed[reduce-overhead-backward] 12.9676ms 12.2361ms 81.7251 Ops/s 86.5253 Ops/s $\textbf{\color{#d91a1a}-5.55\%}$
test_ppo_speed[False-None] 9.1545ms 8.1449ms 122.7762 Ops/s 129.7990 Ops/s $\textbf{\color{#d91a1a}-5.41\%}$
test_ppo_speed[False-backward] 18.1662ms 16.0463ms 62.3197 Ops/s 66.4411 Ops/s $\textbf{\color{#d91a1a}-6.20\%}$
test_ppo_speed[True-None] 6.5429ms 5.6113ms 178.2105 Ops/s 196.1468 Ops/s $\textbf{\color{#d91a1a}-9.14\%}$
test_ppo_speed[True-backward] 12.2576ms 12.0009ms 83.3268 Ops/s 88.0905 Ops/s $\textbf{\color{#d91a1a}-5.41\%}$
test_ppo_speed[reduce-overhead-None] 6.5646ms 5.2056ms 192.1007 Ops/s 194.6062 Ops/s $\color{#d91a1a}-1.29\%$
test_ppo_speed[reduce-overhead-backward] 12.1914ms 11.1862ms 89.3956 Ops/s 87.3514 Ops/s $\color{#35bf28}+2.34\%$
test_reinforce_speed[False-None] 7.7718ms 6.6411ms 150.5773 Ops/s 148.8619 Ops/s $\color{#35bf28}+1.15\%$
test_reinforce_speed[False-backward] 10.3162ms 9.9108ms 100.8998 Ops/s 97.9612 Ops/s $\color{#35bf28}+3.00\%$
test_reinforce_speed[True-None] 4.9204ms 4.2607ms 234.7008 Ops/s 233.1382 Ops/s $\color{#35bf28}+0.67\%$
test_reinforce_speed[True-backward] 11.1528ms 10.2603ms 97.4626 Ops/s 94.8035 Ops/s $\color{#35bf28}+2.80\%$
test_reinforce_speed[reduce-overhead-None] 5.7003ms 4.1606ms 240.3479 Ops/s 235.7231 Ops/s $\color{#35bf28}+1.96\%$
test_reinforce_speed[reduce-overhead-backward] 11.1556ms 10.7840ms 92.7303 Ops/s 96.4990 Ops/s $\color{#d91a1a}-3.91\%$
test_iql_speed[False-None] 40.3053ms 34.3370ms 29.1231 Ops/s 29.9407 Ops/s $\color{#d91a1a}-2.73\%$
test_iql_speed[False-backward] 48.4938ms 46.7109ms 21.4083 Ops/s 21.7381 Ops/s $\color{#d91a1a}-1.52\%$
test_iql_speed[True-None] 17.4683ms 16.4006ms 60.9735 Ops/s 61.9703 Ops/s $\color{#d91a1a}-1.61\%$
test_iql_speed[True-backward] 29.3814ms 28.3805ms 35.2354 Ops/s 36.3213 Ops/s $\color{#d91a1a}-2.99\%$
test_iql_speed[reduce-overhead-None] 17.6696ms 16.2808ms 61.4220 Ops/s 62.0210 Ops/s $\color{#d91a1a}-0.97\%$
test_iql_speed[reduce-overhead-backward] 29.5209ms 28.4341ms 35.1690 Ops/s 36.6029 Ops/s $\color{#d91a1a}-3.92\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8792ms 5.4008ms 185.1591 Ops/s 202.8195 Ops/s $\textbf{\color{#d91a1a}-8.71\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8581ms 0.5264ms 1.8996 KOps/s 1.9088 KOps/s $\color{#d91a1a}-0.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8436ms 0.5093ms 1.9633 KOps/s 1.9920 KOps/s $\color{#d91a1a}-1.44\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.1965ms 4.7304ms 211.3975 Ops/s 214.8862 Ops/s $\color{#d91a1a}-1.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.1423ms 0.5193ms 1.9255 KOps/s 1.9482 KOps/s $\color{#d91a1a}-1.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8435ms 0.4929ms 2.0288 KOps/s 2.0346 KOps/s $\color{#d91a1a}-0.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.3492ms 1.6836ms 593.9538 Ops/s 597.9386 Ops/s $\color{#d91a1a}-0.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 3.4839ms 1.6226ms 616.3129 Ops/s 629.2653 Ops/s $\color{#d91a1a}-2.06\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.8811ms 5.5790ms 179.2433 Ops/s 204.7301 Ops/s $\textbf{\color{#d91a1a}-12.45\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.9425ms 0.6860ms 1.4577 KOps/s 1.5241 KOps/s $\color{#d91a1a}-4.35\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1318ms 0.6649ms 1.5040 KOps/s 1.5653 KOps/s $\color{#d91a1a}-3.91\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6724ms 5.2128ms 191.8365 Ops/s 210.6736 Ops/s $\textbf{\color{#d91a1a}-8.94\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.9142ms 0.5491ms 1.8213 KOps/s 1.9356 KOps/s $\textbf{\color{#d91a1a}-5.90\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7926ms 0.5225ms 1.9140 KOps/s 1.9641 KOps/s $\color{#d91a1a}-2.55\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.5111ms 5.0286ms 198.8633 Ops/s 217.5886 Ops/s $\textbf{\color{#d91a1a}-8.61\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.6593ms 0.5504ms 1.8169 KOps/s 1.9694 KOps/s $\textbf{\color{#d91a1a}-7.74\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7752ms 0.5197ms 1.9241 KOps/s 2.0377 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 7.6678ms 5.4132ms 184.7340 Ops/s 209.1359 Ops/s $\textbf{\color{#d91a1a}-11.67\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.7013ms 0.6911ms 1.4470 KOps/s 1.4851 KOps/s $\color{#d91a1a}-2.56\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.2177ms 0.6675ms 1.4981 KOps/s 1.5662 KOps/s $\color{#d91a1a}-4.35\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.8306ms 4.6940ms 213.0401 Ops/s 23.9887 Ops/s $\textbf{\color{#35bf28}+788.08\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.4090ms 2.6027ms 384.2146 Ops/s 425.1800 Ops/s $\textbf{\color{#d91a1a}-9.63\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.5146ms 1.4322ms 698.2139 Ops/s 809.6516 Ops/s $\textbf{\color{#d91a1a}-13.76\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.1910ms 4.5713ms 218.7559 Ops/s 219.3957 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.8640s 19.6506ms 50.8890 Ops/s 430.6518 Ops/s $\textbf{\color{#d91a1a}-88.18\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.2330ms 1.3625ms 733.9550 Ops/s 702.8890 Ops/s $\color{#35bf28}+4.42\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.9578ms 5.0041ms 199.8356 Ops/s 221.1958 Ops/s $\textbf{\color{#d91a1a}-9.66\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.0728ms 2.6523ms 377.0374 Ops/s 399.0503 Ops/s $\textbf{\color{#d91a1a}-5.52\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.2677ms 1.5882ms 629.6592 Ops/s 646.6836 Ops/s $\color{#d91a1a}-2.63\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.8849ms 12.5097ms 79.9381 Ops/s 78.7479 Ops/s $\color{#35bf28}+1.51\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.4625ms 15.0539ms 66.4282 Ops/s 69.9334 Ops/s $\textbf{\color{#d91a1a}-5.01\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 22.2286ms 21.4296ms 46.6644 Ops/s 46.5489 Ops/s $\color{#35bf28}+0.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.4366ms 15.3269ms 65.2446 Ops/s 67.8967 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.9283ms 21.1055ms 47.3810 Ops/s 46.9948 Ops/s $\color{#35bf28}+0.82\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 22.4239ms 16.1647ms 61.8631 Ops/s 62.3030 Ops/s $\color{#d91a1a}-0.71\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: c3bbd90d50438847249977b2d478a886ff05a2a9
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 5a38f71ba05463ef208b481e39c441fd28857fba
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: d4ea96f47359929f87b7e38fc5e3e1c2cbd10d65
Pull Request resolved: #2832
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.9200s 0.8288s 1.2065 Ops/s 1.2007 Ops/s $\color{#35bf28}+0.48\%$
test_transformed 1.5427s 1.4451s 0.6920 Ops/s 0.6509 Ops/s $\textbf{\color{#35bf28}+6.31\%}$
test_serial 2.4513s 2.3533s 0.4249 Ops/s 0.4139 Ops/s $\color{#35bf28}+2.66\%$
test_parallel 1.8757s 1.8533s 0.5396 Ops/s 0.5373 Ops/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-True-True-True-True] 0.1881ms 39.9134μs 25.0542 KOps/s 25.0989 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-True-True-True-False] 53.7510μs 22.9810μs 43.5142 KOps/s 42.6081 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-True-True-False-True] 48.0910μs 22.0347μs 45.3830 KOps/s 45.3513 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-True-True-False-False] 48.8510μs 12.7992μs 78.1301 KOps/s 76.5372 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[True-True-False-True-True] 0.1333ms 41.1463μs 24.3035 KOps/s 23.6587 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-True-False-True-False] 48.3510μs 25.6944μs 38.9190 KOps/s 39.2078 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-True-False-False-True] 0.1367ms 24.5915μs 40.6645 KOps/s 41.7371 KOps/s $\color{#d91a1a}-2.57\%$
test_step_mdp_speed[True-True-False-False-False] 44.2210μs 15.3686μs 65.0679 KOps/s 65.8326 KOps/s $\color{#d91a1a}-1.16\%$
test_step_mdp_speed[True-False-True-True-True] 77.9720μs 44.8176μs 22.3126 KOps/s 22.5898 KOps/s $\color{#d91a1a}-1.23\%$
test_step_mdp_speed[True-False-True-True-False] 55.8710μs 27.9687μs 35.7542 KOps/s 35.7170 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-True-False-True] 51.4910μs 24.7206μs 40.4521 KOps/s 41.7677 KOps/s $\color{#d91a1a}-3.15\%$
test_step_mdp_speed[True-False-True-False-False] 42.3610μs 15.3435μs 65.1742 KOps/s 65.7151 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-False-True-True] 0.2465ms 46.7374μs 21.3961 KOps/s 21.1946 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[True-False-False-True-False] 57.5620μs 30.2393μs 33.0695 KOps/s 32.9882 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-False-False-True] 60.3320μs 26.3668μs 37.9265 KOps/s 38.1958 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[True-False-False-False-False] 52.1510μs 17.4080μs 57.4448 KOps/s 56.9967 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[False-True-True-True-True] 78.4020μs 44.5447μs 22.4494 KOps/s 22.5676 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[False-True-True-True-False] 54.0810μs 28.0748μs 35.6192 KOps/s 35.6872 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[False-True-True-False-True] 2.7238ms 28.6294μs 34.9291 KOps/s 35.1606 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[False-True-True-False-False] 42.4110μs 17.1968μs 58.1503 KOps/s 58.9887 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[False-True-False-True-True] 0.1235ms 46.3803μs 21.5609 KOps/s 21.6525 KOps/s $\color{#d91a1a}-0.42\%$
test_step_mdp_speed[False-True-False-True-False] 51.9310μs 29.9872μs 33.3475 KOps/s 32.6419 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[False-True-False-False-True] 54.4810μs 30.0942μs 33.2290 KOps/s 32.9502 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[False-True-False-False-False] 46.9210μs 19.0917μs 52.3788 KOps/s 52.5851 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[False-False-True-True-True] 73.9920μs 48.5547μs 20.5953 KOps/s 20.6376 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[False-False-True-True-False] 63.6510μs 32.5379μs 30.7334 KOps/s 30.8774 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-False-True] 53.3920μs 30.6731μs 32.6018 KOps/s 33.0554 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-False-True-False-False] 43.2910μs 19.1990μs 52.0860 KOps/s 52.4005 KOps/s $\color{#d91a1a}-0.60\%$
test_step_mdp_speed[False-False-False-True-True] 78.4320μs 50.8343μs 19.6718 KOps/s 19.6664 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[False-False-False-True-False] 64.2520μs 34.7963μs 28.7387 KOps/s 28.9007 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-False-False-False-True] 60.6410μs 32.1035μs 31.1492 KOps/s 30.9690 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[False-False-False-False-False] 61.7710μs 21.4313μs 46.6607 KOps/s 46.8194 KOps/s $\color{#d91a1a}-0.34\%$
test_values[generalized_advantage_estimate-True-True] 26.6240ms 26.0504ms 38.3872 Ops/s 38.8062 Ops/s $\color{#d91a1a}-1.08\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1228s 3.3692ms 296.8042 Ops/s 303.9764 Ops/s $\color{#d91a1a}-2.36\%$
test_values[td0_return_estimate-False-False] 0.1060ms 81.1765μs 12.3188 KOps/s 11.9466 KOps/s $\color{#35bf28}+3.12\%$
test_values[td1_return_estimate-False-False] 57.6462ms 56.6564ms 17.6503 Ops/s 17.6754 Ops/s $\color{#d91a1a}-0.14\%$
test_values[vec_td1_return_estimate-False-False] 1.3669ms 1.0956ms 912.7741 Ops/s 912.4706 Ops/s $\color{#35bf28}+0.03\%$
test_values[td_lambda_return_estimate-True-False] 90.2894ms 89.4819ms 11.1754 Ops/s 11.0508 Ops/s $\color{#35bf28}+1.13\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2947ms 1.0954ms 912.8686 Ops/s 917.9172 Ops/s $\color{#d91a1a}-0.55\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.6826ms 25.4296ms 39.3242 Ops/s 39.7233 Ops/s $\color{#d91a1a}-1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0700ms 0.7720ms 1.2954 KOps/s 1.3048 KOps/s $\color{#d91a1a}-0.72\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8319ms 0.6761ms 1.4791 KOps/s 1.4468 KOps/s $\color{#35bf28}+2.23\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5639ms 1.4968ms 668.0958 Ops/s 671.1434 Ops/s $\color{#d91a1a}-0.45\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8165ms 0.6894ms 1.4506 KOps/s 1.4523 KOps/s $\color{#d91a1a}-0.12\%$
test_dqn_speed[False-None] 7.6533ms 1.5157ms 659.7773 Ops/s 640.7843 Ops/s $\color{#35bf28}+2.96\%$
test_dqn_speed[False-backward] 2.2498ms 2.1400ms 467.2832 Ops/s 456.7034 Ops/s $\color{#35bf28}+2.32\%$
test_dqn_speed[True-None] 0.7121ms 0.5491ms 1.8213 KOps/s 1.7902 KOps/s $\color{#35bf28}+1.74\%$
test_dqn_speed[True-backward] 1.3766ms 1.2312ms 812.1832 Ops/s 872.9948 Ops/s $\textbf{\color{#d91a1a}-6.97\%}$
test_dqn_speed[reduce-overhead-None] 0.7301ms 0.5720ms 1.7483 KOps/s 1.7141 KOps/s $\color{#35bf28}+1.99\%$
test_dqn_speed[reduce-overhead-backward] 1.1093ms 1.0708ms 933.8496 Ops/s 1.0122 KOps/s $\textbf{\color{#d91a1a}-7.74\%}$
test_ddpg_speed[False-None] 3.0751ms 2.8046ms 356.5535 Ops/s 341.9411 Ops/s $\color{#35bf28}+4.27\%$
test_ddpg_speed[False-backward] 4.3296ms 4.2078ms 237.6534 Ops/s 234.1535 Ops/s $\color{#35bf28}+1.49\%$
test_ddpg_speed[True-None] 1.7386ms 1.3452ms 743.3913 Ops/s 735.2126 Ops/s $\color{#35bf28}+1.11\%$
test_ddpg_speed[True-backward] 2.6749ms 2.6068ms 383.6117 Ops/s 408.1579 Ops/s $\textbf{\color{#d91a1a}-6.01\%}$
test_ddpg_speed[reduce-overhead-None] 1.4205ms 1.3483ms 741.6992 Ops/s 737.1197 Ops/s $\color{#35bf28}+0.62\%$
test_ddpg_speed[reduce-overhead-backward] 2.0983ms 2.0464ms 488.6554 Ops/s 523.6999 Ops/s $\textbf{\color{#d91a1a}-6.69\%}$
test_sac_speed[False-None] 8.4323ms 7.9614ms 125.6062 Ops/s 119.9837 Ops/s $\color{#35bf28}+4.69\%$
test_sac_speed[False-backward] 11.7233ms 11.2541ms 88.8564 Ops/s 88.5485 Ops/s $\color{#35bf28}+0.35\%$
test_sac_speed[True-None] 1.8922ms 1.8308ms 546.2070 Ops/s 531.5305 Ops/s $\color{#35bf28}+2.76\%$
test_sac_speed[True-backward] 3.8126ms 3.7009ms 270.2011 Ops/s 261.7256 Ops/s $\color{#35bf28}+3.24\%$
test_sac_speed[reduce-overhead-None] 20.8745ms 11.8242ms 84.5722 Ops/s 82.8430 Ops/s $\color{#35bf28}+2.09\%$
test_sac_speed[reduce-overhead-backward] 1.7206ms 1.6127ms 620.0825 Ops/s 548.5464 Ops/s $\textbf{\color{#35bf28}+13.04\%}$
test_redq_speed[False-None] 8.2034ms 7.7385ms 129.2244 Ops/s 125.8613 Ops/s $\color{#35bf28}+2.67\%$
test_redq_speed[False-backward] 12.2494ms 11.6810ms 85.6092 Ops/s 81.8154 Ops/s $\color{#35bf28}+4.64\%$
test_redq_speed[True-None] 2.5102ms 2.3493ms 425.6544 Ops/s 425.8602 Ops/s $\color{#d91a1a}-0.05\%$
test_redq_speed[True-backward] 4.8510ms 4.2420ms 235.7386 Ops/s 233.7593 Ops/s $\color{#35bf28}+0.85\%$
test_redq_speed[reduce-overhead-None] 2.7397ms 2.3989ms 416.8602 Ops/s 420.4671 Ops/s $\color{#d91a1a}-0.86\%$
test_redq_speed[reduce-overhead-backward] 4.6335ms 4.2449ms 235.5745 Ops/s 236.5972 Ops/s $\color{#d91a1a}-0.43\%$
test_redq_deprec_speed[False-None] 9.3344ms 8.9714ms 111.4658 Ops/s 108.8680 Ops/s $\color{#35bf28}+2.39\%$
test_redq_deprec_speed[False-backward] 13.1186ms 12.5505ms 79.6782 Ops/s 80.0976 Ops/s $\color{#d91a1a}-0.52\%$
test_redq_deprec_speed[True-None] 3.1025ms 2.6886ms 371.9453 Ops/s 380.9892 Ops/s $\color{#d91a1a}-2.37\%$
test_redq_deprec_speed[True-backward] 5.0436ms 4.5752ms 218.5682 Ops/s 219.3755 Ops/s $\color{#d91a1a}-0.37\%$
test_redq_deprec_speed[reduce-overhead-None] 3.2501ms 2.6681ms 374.8049 Ops/s 380.7223 Ops/s $\color{#d91a1a}-1.55\%$
test_redq_deprec_speed[reduce-overhead-backward] 5.0470ms 4.5863ms 218.0408 Ops/s 219.1180 Ops/s $\color{#d91a1a}-0.49\%$
test_td3_speed[False-None] 8.0836ms 7.9636ms 125.5721 Ops/s 123.7984 Ops/s $\color{#35bf28}+1.43\%$
test_td3_speed[False-backward] 11.1190ms 10.7010ms 93.4492 Ops/s 93.6970 Ops/s $\color{#d91a1a}-0.26\%$
test_td3_speed[True-None] 1.6783ms 1.6405ms 609.5617 Ops/s 591.3865 Ops/s $\color{#35bf28}+3.07\%$
test_td3_speed[True-backward] 3.4281ms 3.2922ms 303.7472 Ops/s 292.9892 Ops/s $\color{#35bf28}+3.67\%$
test_td3_speed[reduce-overhead-None] 51.4340ms 26.4350ms 37.8286 Ops/s 38.5355 Ops/s $\color{#d91a1a}-1.83\%$
test_td3_speed[reduce-overhead-backward] 1.3843ms 1.3151ms 760.3832 Ops/s 665.6663 Ops/s $\textbf{\color{#35bf28}+14.23\%}$
test_cql_speed[False-None] 17.2662ms 16.7920ms 59.5522 Ops/s 58.9448 Ops/s $\color{#35bf28}+1.03\%$
test_cql_speed[False-backward] 22.9723ms 22.1740ms 45.0978 Ops/s 43.6496 Ops/s $\color{#35bf28}+3.32\%$
test_cql_speed[True-None] 3.5945ms 3.4600ms 289.0213 Ops/s 300.3936 Ops/s $\color{#d91a1a}-3.79\%$
test_cql_speed[True-backward] 6.3506ms 5.4945ms 181.9990 Ops/s 175.0162 Ops/s $\color{#35bf28}+3.99\%$
test_cql_speed[reduce-overhead-None] 0.6315s 16.5041ms 60.5911 Ops/s 75.5630 Ops/s $\textbf{\color{#d91a1a}-19.81\%}$
test_cql_speed[reduce-overhead-backward] 2.0762ms 1.9870ms 503.2770 Ops/s 558.0110 Ops/s $\textbf{\color{#d91a1a}-9.81\%}$
test_a2c_speed[False-None] 3.3826ms 3.1887ms 313.6094 Ops/s 311.4874 Ops/s $\color{#35bf28}+0.68\%$
test_a2c_speed[False-backward] 7.1452ms 6.3925ms 156.4321 Ops/s 159.8159 Ops/s $\color{#d91a1a}-2.12\%$
test_a2c_speed[True-None] 1.5547ms 1.3292ms 752.3524 Ops/s 738.2721 Ops/s $\color{#35bf28}+1.91\%$
test_a2c_speed[True-backward] 3.2023ms 3.0867ms 323.9700 Ops/s 310.2989 Ops/s $\color{#35bf28}+4.41\%$
test_a2c_speed[reduce-overhead-None] 16.0910ms 9.1298ms 109.5316 Ops/s 108.3556 Ops/s $\color{#35bf28}+1.09\%$
test_a2c_speed[reduce-overhead-backward] 1.6932ms 1.6155ms 619.0043 Ops/s 611.6893 Ops/s $\color{#35bf28}+1.20\%$
test_ppo_speed[False-None] 3.7955ms 3.7131ms 269.3167 Ops/s 253.1147 Ops/s $\textbf{\color{#35bf28}+6.40\%}$
test_ppo_speed[False-backward] 7.5188ms 7.1270ms 140.3114 Ops/s 136.6406 Ops/s $\color{#35bf28}+2.69\%$
test_ppo_speed[True-None] 1.5494ms 1.4106ms 708.8983 Ops/s 696.0407 Ops/s $\color{#35bf28}+1.85\%$
test_ppo_speed[True-backward] 3.2533ms 3.0741ms 325.3008 Ops/s 304.9969 Ops/s $\textbf{\color{#35bf28}+6.66\%}$
test_ppo_speed[reduce-overhead-None] 1.0697ms 0.9717ms 1.0291 KOps/s 1.0294 KOps/s $\color{#d91a1a}-0.02\%$
test_ppo_speed[reduce-overhead-backward] 1.5351ms 1.4293ms 699.6342 Ops/s 687.3656 Ops/s $\color{#35bf28}+1.78\%$
test_reinforce_speed[False-None] 2.6945ms 2.2629ms 441.9125 Ops/s 432.7071 Ops/s $\color{#35bf28}+2.13\%$
test_reinforce_speed[False-backward] 3.5194ms 3.2769ms 305.1695 Ops/s 297.6386 Ops/s $\color{#35bf28}+2.53\%$
test_reinforce_speed[True-None] 1.7439ms 1.2857ms 777.7862 Ops/s 757.3494 Ops/s $\color{#35bf28}+2.70\%$
test_reinforce_speed[True-backward] 3.0230ms 2.9325ms 341.0057 Ops/s 339.6085 Ops/s $\color{#35bf28}+0.41\%$
test_reinforce_speed[reduce-overhead-None] 19.2423ms 10.5026ms 95.2141 Ops/s 94.4777 Ops/s $\color{#35bf28}+0.78\%$
test_reinforce_speed[reduce-overhead-backward] 1.5766ms 1.4978ms 667.6371 Ops/s 649.5968 Ops/s $\color{#35bf28}+2.78\%$
test_iql_speed[False-None] 9.5898ms 9.1671ms 109.0861 Ops/s 105.0309 Ops/s $\color{#35bf28}+3.86\%$
test_iql_speed[False-backward] 13.1371ms 12.7923ms 78.1719 Ops/s 75.1519 Ops/s $\color{#35bf28}+4.02\%$
test_iql_speed[True-None] 2.6374ms 2.2110ms 452.2897 Ops/s 443.5266 Ops/s $\color{#35bf28}+1.98\%$
test_iql_speed[True-backward] 5.0182ms 4.8123ms 207.8026 Ops/s 205.3569 Ops/s $\color{#35bf28}+1.19\%$
test_iql_speed[reduce-overhead-None] 0.5670s 13.1488ms 76.0528 Ops/s 90.4850 Ops/s $\textbf{\color{#d91a1a}-15.95\%}$
test_iql_speed[reduce-overhead-backward] 2.0521ms 1.9676ms 508.2240 Ops/s 475.9005 Ops/s $\textbf{\color{#35bf28}+6.79\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.8008ms 6.2408ms 160.2369 Ops/s 158.3157 Ops/s $\color{#35bf28}+1.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6708ms 0.3452ms 2.8970 KOps/s 3.6211 KOps/s $\textbf{\color{#d91a1a}-20.00\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6566ms 0.3227ms 3.0985 KOps/s 3.9150 KOps/s $\textbf{\color{#d91a1a}-20.86\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2604ms 5.9558ms 167.9042 Ops/s 166.1615 Ops/s $\color{#35bf28}+1.05\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0700ms 0.2911ms 3.4348 KOps/s 3.7335 KOps/s $\textbf{\color{#d91a1a}-8.00\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5693ms 0.3172ms 3.1531 KOps/s 4.0640 KOps/s $\textbf{\color{#d91a1a}-22.42\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5869ms 1.3209ms 757.0733 Ops/s 768.4616 Ops/s $\color{#d91a1a}-1.48\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6154ms 1.2177ms 821.2404 Ops/s 809.7507 Ops/s $\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5361ms 6.2954ms 158.8462 Ops/s 160.9793 Ops/s $\color{#d91a1a}-1.33\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2760ms 0.4350ms 2.2987 KOps/s 1.9527 KOps/s $\textbf{\color{#35bf28}+17.72\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6370ms 0.3931ms 2.5437 KOps/s 2.2220 KOps/s $\textbf{\color{#35bf28}+14.48\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2119ms 6.0186ms 166.1527 Ops/s 163.5525 Ops/s $\color{#35bf28}+1.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8185ms 0.3892ms 2.5691 KOps/s 2.8243 KOps/s $\textbf{\color{#d91a1a}-9.04\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5794ms 0.3987ms 2.5082 KOps/s 2.9500 KOps/s $\textbf{\color{#d91a1a}-14.98\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 10.0019ms 6.0188ms 166.1473 Ops/s 165.8619 Ops/s $\color{#35bf28}+0.17\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0056ms 0.3286ms 3.0430 KOps/s 3.1391 KOps/s $\color{#d91a1a}-3.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7120ms 0.3110ms 3.2152 KOps/s 3.4566 KOps/s $\textbf{\color{#d91a1a}-6.98\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5530ms 6.2769ms 159.3135 Ops/s 159.9408 Ops/s $\color{#d91a1a}-0.39\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0303ms 0.4444ms 2.2503 KOps/s 2.3004 KOps/s $\color{#d91a1a}-2.18\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7449ms 0.4559ms 2.1933 KOps/s 2.4638 KOps/s $\textbf{\color{#d91a1a}-10.98\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.0985ms 5.5665ms 179.6459 Ops/s 175.6411 Ops/s $\color{#35bf28}+2.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 6.4206ms 2.0380ms 490.6881 Ops/s 423.1499 Ops/s $\textbf{\color{#35bf28}+15.96\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.8084ms 1.2304ms 812.7237 Ops/s 893.7075 Ops/s $\textbf{\color{#d91a1a}-9.06\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.0905ms 5.6387ms 177.3462 Ops/s 174.8450 Ops/s $\color{#35bf28}+1.43\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 7.8720ms 2.0122ms 496.9588 Ops/s 451.1241 Ops/s $\textbf{\color{#35bf28}+10.16\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.7924ms 1.2387ms 807.2841 Ops/s 805.4285 Ops/s $\color{#35bf28}+0.23\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5513s 16.6774ms 59.9614 Ops/s 29.2625 Ops/s $\textbf{\color{#35bf28}+104.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.4933ms 2.2075ms 452.9960 Ops/s 456.9185 Ops/s $\color{#d91a1a}-0.86\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.2414ms 1.3520ms 739.6558 Ops/s 854.7364 Ops/s $\textbf{\color{#d91a1a}-13.46\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 14.0392ms 13.6774ms 73.1130 Ops/s 72.5608 Ops/s $\color{#35bf28}+0.76\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.5226ms 17.0210ms 58.7508 Ops/s 59.8228 Ops/s $\color{#d91a1a}-1.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.1217ms 17.8956ms 55.8795 Ops/s 54.2292 Ops/s $\color{#35bf28}+3.04\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 18.7276ms 17.3620ms 57.5972 Ops/s 59.9435 Ops/s $\color{#d91a1a}-3.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.9238ms 18.4604ms 54.1700 Ops/s 54.3850 Ops/s $\color{#d91a1a}-0.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.6504ms 18.8646ms 53.0093 Ops/s 54.6387 Ops/s $\color{#d91a1a}-2.98\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 674517759cf4a7875229869e8a36da3e85cc2351
Pull Request resolved: #2832
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 674517759cf4a7875229869e8a36da3e85cc2351
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: 2402d3d626dcc6fbe297c0a641be657c72a6ee4b
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 13, 2025
ghstack-source-id: fe507484265b5cd7bbea6739de99e19b3f0b4a92
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 13, 2025
ghstack-source-id: 3c2ae8ac6c7ad8796e22b5090a899bfbc44a7f06
Pull Request resolved: #2832
vmoens added a commit that referenced this pull request Mar 14, 2025
ghstack-source-id: 3c2ae8ac6c7ad8796e22b5090a899bfbc44a7f06
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 14, 2025
ghstack-source-id: 92f56ab0122ef39b1b18c576a36e0b17e9799162
Pull Request resolved: #2832
policy,
frames_per_batch=args.steps_per_batch,
total_frames=1_000_000,
local_weights_updater=HF2vLLMLocalWeightUpdater(
Copy link

@mikaylagawarecki mikaylagawarecki Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to see how this would look when train_model and inference_model are sharded 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there is still some work to be done!
IIUC your implementation you do a full_tensor() then you send that to the vllm weights right?

generate=False,
return_log_probs=True,
)
env.append_transform(
Copy link

@mikaylagawarecki mikaylagawarecki Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the .append_transform API, is it possible to run ShapedCorrectnessReward and KLRewardTransform in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not currently but we can think about it.
The difficulty is that in some cases you may have a transform that requires another one to do it's thing before.

We could imagine

env.append_transform(MyTransform0(async=True)) 
env.append_transform(MyTransform1(async=True)) # does not require MyTransform0
env.append_transform(MyTransform2(blocking=True)) # blocking tells env that you need to have completed the other async before running this one

wdyt?

Copy link

@mikaylagawarecki mikaylagawarecki Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With just reward model and ref_model I think this API would work

But I don't think this API would be encompassing (or at least it might be tricky) for the case where there's a more complex graph of dependencies between transforms

n00b qn: with this API, who is responsible for consolidating the result tds from MyTransform0 and MyTransform1), would all the communications involved be wrapped in a transform?

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 18, 2025
ghstack-source-id: 8340f4b71690667bca910a0bb67b6be3d99b7929
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 18, 2025
ghstack-source-id: e904981fe5da8be8c131a276d8353116b9ec3343
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 18, 2025
ghstack-source-id: 3a5be2ffea45ca802cbad0f5621e33919bc9d52f
Pull Request resolved: #2832
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 20, 2025
ghstack-source-id: c53afce03ca7216908298686535e3777da59884e
Pull Request resolved: #2832
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants