Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] DensifyReward postproc #2823

Merged
merged 4 commits into from
Mar 11, 2025
Merged

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2823

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Unrelated Failure

As of commit d36a97e with merge base 6e40548 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2025
vmoens added a commit that referenced this pull request Mar 3, 2025
ghstack-source-id: 14ddfbe832e21594f829f08795ac4c95cc6ca9f0
Pull Request resolved: #2823
vmoens added a commit that referenced this pull request Mar 3, 2025
ghstack-source-id: 14ddfbe832e21594f829f08795ac4c95cc6ca9f0
Pull Request resolved: #2823
@vmoens vmoens added the enhancement New feature or request label Mar 3, 2025
Copy link

github-actions bot commented Mar 3, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.6162s 0.5351s 1.8687 Ops/s 1.9069 Ops/s $\color{#d91a1a}-2.00\%$
test_transformed 1.1636s 1.0770s 0.9285 Ops/s 0.9771 Ops/s $\color{#d91a1a}-4.98\%$
test_serial 1.6743s 1.5784s 0.6335 Ops/s 0.6535 Ops/s $\color{#d91a1a}-3.05\%$
test_parallel 1.4535s 1.3359s 0.7485 Ops/s 0.7357 Ops/s $\color{#35bf28}+1.74\%$
test_step_mdp_speed[True-True-True-True-True] 0.1593ms 30.9049μs 32.3574 KOps/s 32.8700 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[True-True-True-True-False] 47.9200μs 17.8780μs 55.9346 KOps/s 56.1629 KOps/s $\color{#d91a1a}-0.41\%$
test_step_mdp_speed[True-True-True-False-True] 67.9270μs 17.1397μs 58.3442 KOps/s 58.1596 KOps/s $\color{#35bf28}+0.32\%$
test_step_mdp_speed[True-True-True-False-False] 27.7420μs 10.0775μs 99.2314 KOps/s 100.5566 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[True-True-False-True-True] 70.9220μs 33.1448μs 30.1707 KOps/s 31.0973 KOps/s $\color{#d91a1a}-2.98\%$
test_step_mdp_speed[True-True-False-True-False] 51.1950μs 19.9525μs 50.1191 KOps/s 51.1630 KOps/s $\color{#d91a1a}-2.04\%$
test_step_mdp_speed[True-True-False-False-True] 48.6310μs 19.2475μs 51.9548 KOps/s 52.3434 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[True-True-False-False-False] 35.1260μs 11.9855μs 83.4339 KOps/s 85.0767 KOps/s $\color{#d91a1a}-1.93\%$
test_step_mdp_speed[True-False-True-True-True] 77.4540μs 35.2349μs 28.3810 KOps/s 29.4819 KOps/s $\color{#d91a1a}-3.73\%$
test_step_mdp_speed[True-False-True-True-False] 56.2050μs 21.8685μs 45.7280 KOps/s 46.2899 KOps/s $\color{#d91a1a}-1.21\%$
test_step_mdp_speed[True-False-True-False-True] 52.3380μs 18.9978μs 52.6376 KOps/s 52.4101 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-False-True-False-False] 49.9220μs 12.0262μs 83.1516 KOps/s 84.0842 KOps/s $\color{#d91a1a}-1.11\%$
test_step_mdp_speed[True-False-False-True-True] 74.8900μs 36.6044μs 27.3191 KOps/s 27.9311 KOps/s $\color{#d91a1a}-2.19\%$
test_step_mdp_speed[True-False-False-True-False] 61.9950μs 23.7683μs 42.0729 KOps/s 43.1849 KOps/s $\color{#d91a1a}-2.57\%$
test_step_mdp_speed[True-False-False-False-True] 51.5560μs 21.1053μs 47.3815 KOps/s 48.2825 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[True-False-False-False-False] 36.7790μs 13.8816μs 72.0377 KOps/s 73.2710 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[False-True-True-True-True] 89.2950μs 34.6459μs 28.8634 KOps/s 29.0465 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-True-True-False] 0.5757ms 21.6943μs 46.0951 KOps/s 46.5036 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[False-True-True-False-True] 74.5400μs 22.0129μs 45.4278 KOps/s 46.4645 KOps/s $\color{#d91a1a}-2.23\%$
test_step_mdp_speed[False-True-True-False-False] 42.5890μs 13.2918μs 75.2345 KOps/s 75.6439 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-True-False-True-True] 87.0810μs 35.6724μs 28.0328 KOps/s 27.9016 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-True-False-True-False] 2.9762ms 23.7708μs 42.0685 KOps/s 42.8141 KOps/s $\color{#d91a1a}-1.74\%$
test_step_mdp_speed[False-True-False-False-True] 56.4750μs 23.8372μs 41.9512 KOps/s 42.7419 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[False-True-False-False-False] 66.6640μs 15.2021μs 65.7805 KOps/s 66.8821 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[False-False-True-True-True] 85.2990μs 37.7889μs 26.4628 KOps/s 26.5164 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[False-False-True-True-False] 78.9870μs 25.4706μs 39.2609 KOps/s 39.8346 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-False-True-False-True] 73.4770μs 24.0582μs 41.5659 KOps/s 42.3313 KOps/s $\color{#d91a1a}-1.81\%$
test_step_mdp_speed[False-False-True-False-False] 54.3820μs 15.2035μs 65.7743 KOps/s 66.3041 KOps/s $\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-False-False-True-True] 94.2460μs 39.9100μs 25.0564 KOps/s 25.5786 KOps/s $\color{#d91a1a}-2.04\%$
test_step_mdp_speed[False-False-False-True-False] 59.9020μs 27.1418μs 36.8436 KOps/s 37.5783 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[False-False-False-False-True] 79.6800μs 25.3958μs 39.3766 KOps/s 40.1150 KOps/s $\color{#d91a1a}-1.84\%$
test_step_mdp_speed[False-False-False-False-False] 65.8710μs 16.8893μs 59.2091 KOps/s 59.8694 KOps/s $\color{#d91a1a}-1.10\%$
test_values[generalized_advantage_estimate-True-True] 10.3429ms 9.6433ms 103.6992 Ops/s 99.9356 Ops/s $\color{#35bf28}+3.77\%$
test_values[vec_generalized_advantage_estimate-True-True] 28.3487ms 26.1148ms 38.2925 Ops/s 40.9299 Ops/s $\textbf{\color{#d91a1a}-6.44\%}$
test_values[td0_return_estimate-False-False] 1.6464ms 0.2070ms 4.8313 KOps/s 5.2112 KOps/s $\textbf{\color{#d91a1a}-7.29\%}$
test_values[td1_return_estimate-False-False] 27.2791ms 24.0182ms 41.6350 Ops/s 41.6781 Ops/s $\color{#d91a1a}-0.10\%$
test_values[vec_td1_return_estimate-False-False] 31.7610ms 26.8925ms 37.1851 Ops/s 40.8658 Ops/s $\textbf{\color{#d91a1a}-9.01\%}$
test_values[td_lambda_return_estimate-True-False] 36.2128ms 34.5766ms 28.9213 Ops/s 28.6511 Ops/s $\color{#35bf28}+0.94\%$
test_values[vec_td_lambda_return_estimate-True-False] 28.5445ms 26.1429ms 38.2514 Ops/s 40.1053 Ops/s $\color{#d91a1a}-4.62\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.7752ms 8.4329ms 118.5828 Ops/s 119.3721 Ops/s $\color{#d91a1a}-0.66\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2655ms 1.8961ms 527.4087 Ops/s 518.6800 Ops/s $\color{#35bf28}+1.68\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4879ms 0.3700ms 2.7028 KOps/s 2.6601 KOps/s $\color{#35bf28}+1.61\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 50.4290ms 48.2902ms 20.7081 Ops/s 22.3471 Ops/s $\textbf{\color{#d91a1a}-7.33\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 5.5562ms 3.4473ms 290.0830 Ops/s 289.6401 Ops/s $\color{#35bf28}+0.15\%$
test_dqn_speed[False-None] 1.9534ms 1.4537ms 687.9164 Ops/s 692.1638 Ops/s $\color{#d91a1a}-0.61\%$
test_dqn_speed[False-backward] 2.1137ms 1.9397ms 515.5386 Ops/s 513.4140 Ops/s $\color{#35bf28}+0.41\%$
test_dqn_speed[True-None] 0.8205ms 0.5591ms 1.7887 KOps/s 1.7582 KOps/s $\color{#35bf28}+1.74\%$
test_dqn_speed[True-backward] 1.0141ms 0.9740ms 1.0267 KOps/s 737.5012 Ops/s $\textbf{\color{#35bf28}+39.21\%}$
test_dqn_speed[reduce-overhead-None] 0.7880ms 0.5659ms 1.7670 KOps/s 1.7772 KOps/s $\color{#d91a1a}-0.57\%$
test_dqn_speed[reduce-overhead-backward] 1.1351ms 0.9823ms 1.0180 KOps/s 985.4559 Ops/s $\color{#35bf28}+3.30\%$
test_ddpg_speed[False-None] 3.9707ms 2.9691ms 336.8063 Ops/s 340.9338 Ops/s $\color{#d91a1a}-1.21\%$
test_ddpg_speed[False-backward] 4.2432ms 4.0609ms 246.2489 Ops/s 242.7908 Ops/s $\color{#35bf28}+1.42\%$
test_ddpg_speed[True-None] 1.9242ms 1.4285ms 700.0283 Ops/s 690.2138 Ops/s $\color{#35bf28}+1.42\%$
test_ddpg_speed[True-backward] 2.5275ms 2.3487ms 425.7625 Ops/s 423.7403 Ops/s $\color{#35bf28}+0.48\%$
test_ddpg_speed[reduce-overhead-None] 1.7097ms 1.4323ms 698.1729 Ops/s 690.0619 Ops/s $\color{#35bf28}+1.18\%$
test_ddpg_speed[reduce-overhead-backward] 2.3997ms 2.3482ms 425.8655 Ops/s 427.5788 Ops/s $\color{#d91a1a}-0.40\%$
test_sac_speed[False-None] 8.7458ms 8.1606ms 122.5398 Ops/s 122.2362 Ops/s $\color{#35bf28}+0.25\%$
test_sac_speed[False-backward] 11.1041ms 10.8163ms 92.4528 Ops/s 91.1122 Ops/s $\color{#35bf28}+1.47\%$
test_sac_speed[True-None] 2.7977ms 2.5594ms 390.7187 Ops/s 384.0376 Ops/s $\color{#35bf28}+1.74\%$
test_sac_speed[True-backward] 5.5243ms 4.3324ms 230.8216 Ops/s 227.5908 Ops/s $\color{#35bf28}+1.42\%$
test_sac_speed[reduce-overhead-None] 4.7303ms 2.5582ms 390.9028 Ops/s 386.1174 Ops/s $\color{#35bf28}+1.24\%$
test_sac_speed[reduce-overhead-backward] 5.1992ms 4.2992ms 232.6022 Ops/s 234.1716 Ops/s $\color{#d91a1a}-0.67\%$
test_redq_speed[False-None] 19.8376ms 13.2989ms 75.1941 Ops/s 75.9823 Ops/s $\color{#d91a1a}-1.04\%$
test_redq_speed[False-backward] 29.3019ms 23.1390ms 43.2170 Ops/s 44.2295 Ops/s $\color{#d91a1a}-2.29\%$
test_redq_speed[True-None] 10.4271ms 6.8973ms 144.9847 Ops/s 141.7929 Ops/s $\color{#35bf28}+2.25\%$
test_redq_speed[True-backward] 15.3560ms 14.5983ms 68.5011 Ops/s 67.7868 Ops/s $\color{#35bf28}+1.05\%$
test_redq_speed[reduce-overhead-None] 7.8104ms 6.9627ms 143.6215 Ops/s 141.9034 Ops/s $\color{#35bf28}+1.21\%$
test_redq_speed[reduce-overhead-backward] 16.0000ms 14.6545ms 68.2385 Ops/s 68.1632 Ops/s $\color{#35bf28}+0.11\%$
test_redq_deprec_speed[False-None] 14.7529ms 13.0956ms 76.3617 Ops/s 73.6972 Ops/s $\color{#35bf28}+3.62\%$
test_redq_deprec_speed[False-backward] 19.3797ms 18.5364ms 53.9478 Ops/s 52.3572 Ops/s $\color{#35bf28}+3.04\%$
test_redq_deprec_speed[True-None] 6.0350ms 5.2133ms 191.8174 Ops/s 188.0720 Ops/s $\color{#35bf28}+1.99\%$
test_redq_deprec_speed[True-backward] 11.0237ms 10.2216ms 97.8319 Ops/s 97.4358 Ops/s $\color{#35bf28}+0.41\%$
test_redq_deprec_speed[reduce-overhead-None] 6.1214ms 5.4818ms 182.4212 Ops/s 189.1134 Ops/s $\color{#d91a1a}-3.54\%$
test_redq_deprec_speed[reduce-overhead-backward] 10.9712ms 10.2014ms 98.0261 Ops/s 99.6701 Ops/s $\color{#d91a1a}-1.65\%$
test_td3_speed[False-None] 8.4954ms 8.0640ms 124.0076 Ops/s 121.1147 Ops/s $\color{#35bf28}+2.39\%$
test_td3_speed[False-backward] 10.8974ms 10.4591ms 95.6105 Ops/s 93.0441 Ops/s $\color{#35bf28}+2.76\%$
test_td3_speed[True-None] 2.4626ms 2.2958ms 435.5765 Ops/s 437.9990 Ops/s $\color{#d91a1a}-0.55\%$
test_td3_speed[True-backward] 4.0145ms 3.9444ms 253.5236 Ops/s 254.4477 Ops/s $\color{#d91a1a}-0.36\%$
test_td3_speed[reduce-overhead-None] 2.3735ms 2.2814ms 438.3288 Ops/s 436.8004 Ops/s $\color{#35bf28}+0.35\%$
test_td3_speed[reduce-overhead-backward] 4.6381ms 3.9600ms 252.5259 Ops/s 254.5583 Ops/s $\color{#d91a1a}-0.80\%$
test_cql_speed[False-None] 39.1764ms 36.6275ms 27.3019 Ops/s 27.1688 Ops/s $\color{#35bf28}+0.49\%$
test_cql_speed[False-backward] 50.8469ms 47.0010ms 21.2761 Ops/s 21.3546 Ops/s $\color{#d91a1a}-0.37\%$
test_cql_speed[True-None] 23.8196ms 22.3039ms 44.8352 Ops/s 44.3976 Ops/s $\color{#35bf28}+0.99\%$
test_cql_speed[True-backward] 30.8014ms 29.6172ms 33.7642 Ops/s 33.6030 Ops/s $\color{#35bf28}+0.48\%$
test_cql_speed[reduce-overhead-None] 23.3610ms 22.3719ms 44.6989 Ops/s 44.3372 Ops/s $\color{#35bf28}+0.82\%$
test_cql_speed[reduce-overhead-backward] 30.5548ms 29.4653ms 33.9382 Ops/s 33.7237 Ops/s $\color{#35bf28}+0.64\%$
test_a2c_speed[False-None] 8.7698ms 7.2075ms 138.7451 Ops/s 137.8508 Ops/s $\color{#35bf28}+0.65\%$
test_a2c_speed[False-backward] 15.0840ms 14.2760ms 70.0477 Ops/s 68.6871 Ops/s $\color{#35bf28}+1.98\%$
test_a2c_speed[True-None] 5.5715ms 4.7055ms 212.5189 Ops/s 214.1923 Ops/s $\color{#d91a1a}-0.78\%$
test_a2c_speed[True-backward] 11.6226ms 11.2594ms 88.8144 Ops/s 88.3923 Ops/s $\color{#35bf28}+0.48\%$
test_a2c_speed[reduce-overhead-None] 5.5354ms 4.6796ms 213.6916 Ops/s 213.6050 Ops/s $\color{#35bf28}+0.04\%$
test_a2c_speed[reduce-overhead-backward] 11.5304ms 11.1785ms 89.4571 Ops/s 88.5465 Ops/s $\color{#35bf28}+1.03\%$
test_ppo_speed[False-None] 8.0802ms 7.5138ms 133.0886 Ops/s 131.4794 Ops/s $\color{#35bf28}+1.22\%$
test_ppo_speed[False-backward] 16.0501ms 14.9948ms 66.6899 Ops/s 64.3126 Ops/s $\color{#35bf28}+3.70\%$
test_ppo_speed[True-None] 5.8184ms 5.0864ms 196.6015 Ops/s 195.1298 Ops/s $\color{#35bf28}+0.75\%$
test_ppo_speed[True-backward] 11.6670ms 11.0178ms 90.7625 Ops/s 90.0003 Ops/s $\color{#35bf28}+0.85\%$
test_ppo_speed[reduce-overhead-None] 6.0010ms 5.0360ms 198.5716 Ops/s 196.5353 Ops/s $\color{#35bf28}+1.04\%$
test_ppo_speed[reduce-overhead-backward] 11.3822ms 11.0145ms 90.7895 Ops/s 90.6588 Ops/s $\color{#35bf28}+0.14\%$
test_reinforce_speed[False-None] 7.2082ms 6.5348ms 153.0267 Ops/s 152.0339 Ops/s $\color{#35bf28}+0.65\%$
test_reinforce_speed[False-backward] 10.4142ms 9.8927ms 101.0849 Ops/s 100.6548 Ops/s $\color{#35bf28}+0.43\%$
test_reinforce_speed[True-None] 5.3174ms 4.0498ms 246.9276 Ops/s 244.9815 Ops/s $\color{#35bf28}+0.79\%$
test_reinforce_speed[True-backward] 10.5005ms 10.0456ms 99.5458 Ops/s 98.8281 Ops/s $\color{#35bf28}+0.73\%$
test_reinforce_speed[reduce-overhead-None] 4.5574ms 4.0462ms 247.1442 Ops/s 239.0908 Ops/s $\color{#35bf28}+3.37\%$
test_reinforce_speed[reduce-overhead-backward] 10.9644ms 10.0142ms 99.8583 Ops/s 96.9862 Ops/s $\color{#35bf28}+2.96\%$
test_iql_speed[False-None] 33.6626ms 32.3228ms 30.9379 Ops/s 29.8374 Ops/s $\color{#35bf28}+3.69\%$
test_iql_speed[False-backward] 49.3598ms 45.6119ms 21.9241 Ops/s 21.4851 Ops/s $\color{#35bf28}+2.04\%$
test_iql_speed[True-None] 17.3184ms 15.8293ms 63.1738 Ops/s 62.8866 Ops/s $\color{#35bf28}+0.46\%$
test_iql_speed[True-backward] 28.4128ms 27.1035ms 36.8956 Ops/s 36.6288 Ops/s $\color{#35bf28}+0.73\%$
test_iql_speed[reduce-overhead-None] 17.3353ms 15.9378ms 62.7441 Ops/s 62.4627 Ops/s $\color{#35bf28}+0.45\%$
test_iql_speed[reduce-overhead-backward] 29.4997ms 27.2593ms 36.6847 Ops/s 36.0135 Ops/s $\color{#35bf28}+1.86\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.0934ms 4.8130ms 207.7708 Ops/s 199.7558 Ops/s $\color{#35bf28}+4.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.6026ms 0.5426ms 1.8431 KOps/s 1.8575 KOps/s $\color{#d91a1a}-0.77\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7542ms 0.5149ms 1.9420 KOps/s 1.9362 KOps/s $\color{#35bf28}+0.30\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.1279ms 4.7724ms 209.5399 Ops/s 212.3716 Ops/s $\color{#d91a1a}-1.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.7082ms 0.5334ms 1.8748 KOps/s 1.8798 KOps/s $\color{#d91a1a}-0.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9446ms 0.5090ms 1.9646 KOps/s 1.9803 KOps/s $\color{#d91a1a}-0.79\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.3293ms 1.7073ms 585.7293 Ops/s 583.7298 Ops/s $\color{#35bf28}+0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.8433ms 1.6133ms 619.8599 Ops/s 613.5489 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2757ms 4.8166ms 207.6163 Ops/s 206.6276 Ops/s $\color{#35bf28}+0.48\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.1893ms 0.6847ms 1.4606 KOps/s 1.4663 KOps/s $\color{#d91a1a}-0.39\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9573ms 0.6566ms 1.5230 KOps/s 1.5260 KOps/s $\color{#d91a1a}-0.19\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.1222ms 4.6792ms 213.7125 Ops/s 213.2991 Ops/s $\color{#35bf28}+0.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.6214ms 0.5326ms 1.8774 KOps/s 1.8804 KOps/s $\color{#d91a1a}-0.16\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8865ms 0.5167ms 1.9354 KOps/s 1.9612 KOps/s $\color{#d91a1a}-1.32\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 7.1618ms 4.6334ms 215.8257 Ops/s 217.0603 Ops/s $\color{#d91a1a}-0.57\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7554ms 0.5240ms 1.9085 KOps/s 1.8586 KOps/s $\color{#35bf28}+2.69\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8921ms 0.5083ms 1.9672 KOps/s 1.9910 KOps/s $\color{#d91a1a}-1.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.9763ms 4.6809ms 213.6334 Ops/s 210.7472 Ops/s $\color{#35bf28}+1.37\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4927ms 0.6761ms 1.4790 KOps/s 1.4906 KOps/s $\color{#d91a1a}-0.78\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8619ms 0.6456ms 1.5488 KOps/s 1.5423 KOps/s $\color{#35bf28}+0.42\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.5290ms 4.2157ms 237.2083 Ops/s 240.3719 Ops/s $\color{#d91a1a}-1.32\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.4536ms 2.2310ms 448.2250 Ops/s 440.6021 Ops/s $\color{#35bf28}+1.73\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.4662ms 1.4044ms 712.0385 Ops/s 760.6242 Ops/s $\textbf{\color{#d91a1a}-6.39\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4355s 12.8599ms 77.7608 Ops/s 230.1640 Ops/s $\textbf{\color{#d91a1a}-66.22\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.3099ms 2.3399ms 427.3720 Ops/s 441.7397 Ops/s $\color{#d91a1a}-3.25\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.8899ms 1.3971ms 715.7630 Ops/s 750.3062 Ops/s $\color{#d91a1a}-4.60\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 10.0480ms 4.4918ms 222.6260 Ops/s 32.0919 Ops/s $\textbf{\color{#35bf28}+593.71\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.2446ms 2.5521ms 391.8274 Ops/s 361.8086 Ops/s $\textbf{\color{#35bf28}+8.30\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.4610ms 1.5434ms 647.9014 Ops/s 635.1432 Ops/s $\color{#35bf28}+2.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 14.5010ms 11.7560ms 85.0630 Ops/s 81.7634 Ops/s $\color{#35bf28}+4.04\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.4566ms 14.2752ms 70.0517 Ops/s 68.5187 Ops/s $\color{#35bf28}+2.24\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 21.2828ms 20.6490ms 48.4286 Ops/s 47.3180 Ops/s $\color{#35bf28}+2.35\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.5052ms 14.5056ms 68.9391 Ops/s 67.7253 Ops/s $\color{#35bf28}+1.79\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 20.9802ms 20.6212ms 48.4938 Ops/s 47.2785 Ops/s $\color{#35bf28}+2.57\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.3428ms 15.5246ms 64.4140 Ops/s 62.6329 Ops/s $\color{#35bf28}+2.84\%$

[ghstack-poisoned]
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 6, 2025
ghstack-source-id: c6ea33907310d84f75b3890bc578a07a17196523
Pull Request resolved: #2823
[ghstack-poisoned]
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.9076s 0.8258s 1.2110 Ops/s 1.1785 Ops/s $\color{#35bf28}+2.75\%$
test_transformed 1.5272s 1.4357s 0.6965 Ops/s 0.6516 Ops/s $\textbf{\color{#35bf28}+6.89\%}$
test_serial 2.4411s 2.3319s 0.4288 Ops/s 0.4163 Ops/s $\color{#35bf28}+3.02\%$
test_parallel 1.9181s 1.8655s 0.5360 Ops/s 0.5041 Ops/s $\textbf{\color{#35bf28}+6.33\%}$
test_step_mdp_speed[True-True-True-True-True] 0.2237ms 39.9668μs 25.0208 KOps/s 24.9476 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-True-True-False] 54.3010μs 23.2845μs 42.9470 KOps/s 42.8285 KOps/s $\color{#35bf28}+0.28\%$
test_step_mdp_speed[True-True-True-False-True] 54.6710μs 22.1821μs 45.0814 KOps/s 45.5307 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[True-True-True-False-False] 42.2510μs 13.0623μs 76.5564 KOps/s 76.0996 KOps/s $\color{#35bf28}+0.60\%$
test_step_mdp_speed[True-True-False-True-True] 83.7410μs 42.2788μs 23.6525 KOps/s 23.4536 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[True-True-False-True-False] 58.9710μs 25.8051μs 38.7520 KOps/s 39.2946 KOps/s $\color{#d91a1a}-1.38\%$
test_step_mdp_speed[True-True-False-False-True] 66.4310μs 24.5858μs 40.6740 KOps/s 40.6795 KOps/s $\color{#d91a1a}-0.01\%$
test_step_mdp_speed[True-True-False-False-False] 45.5310μs 15.3475μs 65.1572 KOps/s 64.1864 KOps/s $\color{#35bf28}+1.51\%$
test_step_mdp_speed[True-False-True-True-True] 83.2810μs 44.7099μs 22.3664 KOps/s 22.1940 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[True-False-True-True-False] 70.0910μs 27.9309μs 35.8026 KOps/s 35.3607 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[True-False-True-False-True] 58.2910μs 24.1459μs 41.4148 KOps/s 40.7022 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-False-True-False-False] 53.9410μs 15.1042μs 66.2069 KOps/s 66.0413 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[True-False-False-True-True] 90.1310μs 46.4778μs 21.5156 KOps/s 20.9951 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[True-False-False-True-False] 67.4210μs 30.0127μs 33.3192 KOps/s 32.7651 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[True-False-False-False-True] 69.8610μs 26.8426μs 37.2542 KOps/s 36.8075 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-False-False-False-False] 53.8910μs 17.5199μs 57.0779 KOps/s 56.8948 KOps/s $\color{#35bf28}+0.32\%$
test_step_mdp_speed[False-True-True-True-True] 94.9920μs 45.4074μs 22.0229 KOps/s 22.3546 KOps/s $\color{#d91a1a}-1.48\%$
test_step_mdp_speed[False-True-True-True-False] 77.3410μs 28.2465μs 35.4027 KOps/s 35.7799 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-True-True-False-True] 2.6009ms 28.8495μs 34.6626 KOps/s 34.9550 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-True-True-False-False] 44.2500μs 17.1661μs 58.2545 KOps/s 59.2378 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-True-False-True-True] 0.1177ms 47.3805μs 21.1057 KOps/s 20.8456 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-True-False-True-False] 70.1710μs 30.3601μs 32.9380 KOps/s 32.3458 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[False-True-False-False-True] 69.1510μs 30.5200μs 32.7654 KOps/s 32.2759 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-True-False-False-False] 55.1710μs 19.1799μs 52.1380 KOps/s 51.1373 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[False-False-True-True-True] 94.6410μs 49.2856μs 20.2899 KOps/s 19.9250 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[False-False-True-True-False] 74.3310μs 32.4086μs 30.8560 KOps/s 30.4829 KOps/s $\color{#35bf28}+1.22\%$
test_step_mdp_speed[False-False-True-False-True] 85.4010μs 30.5210μs 32.7643 KOps/s 33.4514 KOps/s $\color{#d91a1a}-2.05\%$
test_step_mdp_speed[False-False-True-False-False] 69.1710μs 19.2589μs 51.9240 KOps/s 51.3128 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-False-False-True-True] 96.8510μs 51.4224μs 19.4468 KOps/s 19.5948 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[False-False-False-True-False] 75.4610μs 34.7144μs 28.8065 KOps/s 28.8790 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[False-False-False-False-True] 72.5620μs 32.1971μs 31.0587 KOps/s 30.9244 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[False-False-False-False-False] 70.1510μs 21.7574μs 45.9613 KOps/s 46.2940 KOps/s $\color{#d91a1a}-0.72\%$
test_values[generalized_advantage_estimate-True-True] 26.7597ms 26.0444ms 38.3960 Ops/s 40.7362 Ops/s $\textbf{\color{#d91a1a}-5.74\%}$
test_values[vec_generalized_advantage_estimate-True-True] 0.1076s 3.0466ms 328.2370 Ops/s 339.2421 Ops/s $\color{#d91a1a}-3.24\%$
test_values[td0_return_estimate-False-False] 0.1076ms 82.5708μs 12.1108 KOps/s 12.8004 KOps/s $\textbf{\color{#d91a1a}-5.39\%}$
test_values[td1_return_estimate-False-False] 59.0328ms 54.5672ms 18.3260 Ops/s 18.5044 Ops/s $\color{#d91a1a}-0.96\%$
test_values[vec_td1_return_estimate-False-False] 1.3372ms 1.0755ms 929.8236 Ops/s 927.8321 Ops/s $\color{#35bf28}+0.21\%$
test_values[td_lambda_return_estimate-True-False] 93.3551ms 89.0182ms 11.2337 Ops/s 11.6716 Ops/s $\color{#d91a1a}-3.75\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3990ms 1.0770ms 928.4740 Ops/s 930.3134 Ops/s $\color{#d91a1a}-0.20\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.4084ms 24.2421ms 41.2505 Ops/s 41.0617 Ops/s $\color{#35bf28}+0.46\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0266ms 0.7450ms 1.3422 KOps/s 1.3402 KOps/s $\color{#35bf28}+0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7097ms 0.6572ms 1.5216 KOps/s 1.5212 KOps/s $\color{#35bf28}+0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5345ms 1.4742ms 678.3182 Ops/s 677.8804 Ops/s $\color{#35bf28}+0.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7325ms 0.6720ms 1.4880 KOps/s 1.4856 KOps/s $\color{#35bf28}+0.16\%$
test_dqn_speed[False-None] 1.5948ms 1.4912ms 670.6071 Ops/s 647.8048 Ops/s $\color{#35bf28}+3.52\%$
test_dqn_speed[False-backward] 2.1683ms 2.1283ms 469.8663 Ops/s 464.4853 Ops/s $\color{#35bf28}+1.16\%$
test_dqn_speed[True-None] 0.6991ms 0.5417ms 1.8462 KOps/s 1.7234 KOps/s $\textbf{\color{#35bf28}+7.13\%}$
test_dqn_speed[True-backward] 1.1752ms 1.1206ms 892.3724 Ops/s 806.6616 Ops/s $\textbf{\color{#35bf28}+10.63\%}$
test_dqn_speed[reduce-overhead-None] 0.9552ms 0.5621ms 1.7791 KOps/s 1.7261 KOps/s $\color{#35bf28}+3.07\%$
test_dqn_speed[reduce-overhead-backward] 1.0162ms 0.9581ms 1.0437 KOps/s 932.5958 Ops/s $\textbf{\color{#35bf28}+11.91\%}$
test_ddpg_speed[False-None] 3.1605ms 2.8258ms 353.8851 Ops/s 348.4385 Ops/s $\color{#35bf28}+1.56\%$
test_ddpg_speed[False-backward] 4.6140ms 4.1263ms 242.3457 Ops/s 230.1879 Ops/s $\textbf{\color{#35bf28}+5.28\%}$
test_ddpg_speed[True-None] 1.3872ms 1.3185ms 758.4148 Ops/s 749.1792 Ops/s $\color{#35bf28}+1.23\%$
test_ddpg_speed[True-backward] 2.4577ms 2.4112ms 414.7335 Ops/s 408.2956 Ops/s $\color{#35bf28}+1.58\%$
test_ddpg_speed[reduce-overhead-None] 1.4027ms 1.3334ms 749.9552 Ops/s 741.8687 Ops/s $\color{#35bf28}+1.09\%$
test_ddpg_speed[reduce-overhead-backward] 1.9335ms 1.8826ms 531.1860 Ops/s 525.6888 Ops/s $\color{#35bf28}+1.05\%$
test_sac_speed[False-None] 8.4608ms 7.9968ms 125.0498 Ops/s 122.0930 Ops/s $\color{#35bf28}+2.42\%$
test_sac_speed[False-backward] 11.6796ms 11.0561ms 90.4475 Ops/s 89.3034 Ops/s $\color{#35bf28}+1.28\%$
test_sac_speed[True-None] 1.9102ms 1.8053ms 553.9268 Ops/s 546.1300 Ops/s $\color{#35bf28}+1.43\%$
test_sac_speed[True-backward] 3.7177ms 3.5906ms 278.5064 Ops/s 276.3312 Ops/s $\color{#35bf28}+0.79\%$
test_sac_speed[reduce-overhead-None] 21.5614ms 12.0133ms 83.2414 Ops/s 83.5297 Ops/s $\color{#d91a1a}-0.35\%$
test_sac_speed[reduce-overhead-backward] 1.6491ms 1.6078ms 621.9551 Ops/s 551.8146 Ops/s $\textbf{\color{#35bf28}+12.71\%}$
test_redq_speed[False-None] 8.0805ms 7.5105ms 133.1462 Ops/s 126.7434 Ops/s $\textbf{\color{#35bf28}+5.05\%}$
test_redq_speed[False-backward] 12.1508ms 11.6393ms 85.9157 Ops/s 82.5777 Ops/s $\color{#35bf28}+4.04\%$
test_redq_speed[True-None] 2.3902ms 2.2947ms 435.7887 Ops/s 430.5093 Ops/s $\color{#35bf28}+1.23\%$
test_redq_speed[True-backward] 4.2903ms 4.1725ms 239.6640 Ops/s 234.1024 Ops/s $\color{#35bf28}+2.38\%$
test_redq_speed[reduce-overhead-None] 2.5323ms 2.3230ms 430.4747 Ops/s 426.2250 Ops/s $\color{#35bf28}+1.00\%$
test_redq_speed[reduce-overhead-backward] 4.5750ms 4.0336ms 247.9192 Ops/s 233.2680 Ops/s $\textbf{\color{#35bf28}+6.28\%}$
test_redq_deprec_speed[False-None] 9.3604ms 9.0269ms 110.7795 Ops/s 109.0266 Ops/s $\color{#35bf28}+1.61\%$
test_redq_deprec_speed[False-backward] 12.6180ms 12.0887ms 82.7216 Ops/s 80.8653 Ops/s $\color{#35bf28}+2.30\%$
test_redq_deprec_speed[True-None] 2.9813ms 2.6561ms 376.4848 Ops/s 376.8053 Ops/s $\color{#d91a1a}-0.09\%$
test_redq_deprec_speed[True-backward] 4.7383ms 4.2772ms 233.7979 Ops/s 224.8853 Ops/s $\color{#35bf28}+3.96\%$
test_redq_deprec_speed[reduce-overhead-None] 2.7808ms 2.6028ms 384.2075 Ops/s 379.0186 Ops/s $\color{#35bf28}+1.37\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.7752ms 4.2788ms 233.7089 Ops/s 218.6102 Ops/s $\textbf{\color{#35bf28}+6.91\%}$
test_td3_speed[False-None] 8.1061ms 7.9425ms 125.9052 Ops/s 123.5830 Ops/s $\color{#35bf28}+1.88\%$
test_td3_speed[False-backward] 11.2831ms 10.3720ms 96.4132 Ops/s 93.4998 Ops/s $\color{#35bf28}+3.12\%$
test_td3_speed[True-None] 1.6346ms 1.6102ms 621.0433 Ops/s 603.4830 Ops/s $\color{#35bf28}+2.91\%$
test_td3_speed[True-backward] 3.4988ms 3.3258ms 300.6787 Ops/s 293.6495 Ops/s $\color{#35bf28}+2.39\%$
test_td3_speed[reduce-overhead-None] 76.5846ms 26.0835ms 38.3384 Ops/s 38.4460 Ops/s $\color{#d91a1a}-0.28\%$
test_td3_speed[reduce-overhead-backward] 1.4073ms 1.3313ms 751.1389 Ops/s 671.1824 Ops/s $\textbf{\color{#35bf28}+11.91\%}$
test_cql_speed[False-None] 17.1083ms 16.7496ms 59.7028 Ops/s 58.8564 Ops/s $\color{#35bf28}+1.44\%$
test_cql_speed[False-backward] 22.4941ms 22.0328ms 45.3869 Ops/s 44.1959 Ops/s $\color{#35bf28}+2.69\%$
test_cql_speed[True-None] 3.5045ms 3.2331ms 309.3048 Ops/s 306.5461 Ops/s $\color{#35bf28}+0.90\%$
test_cql_speed[True-backward] 6.1240ms 5.6836ms 175.9434 Ops/s 179.6914 Ops/s $\color{#d91a1a}-2.09\%$
test_cql_speed[reduce-overhead-None] 0.5908s 16.2194ms 61.6547 Ops/s 76.3423 Ops/s $\textbf{\color{#d91a1a}-19.24\%}$
test_cql_speed[reduce-overhead-backward] 2.0213ms 1.9635ms 509.2855 Ops/s 547.3022 Ops/s $\textbf{\color{#d91a1a}-6.95\%}$
test_a2c_speed[False-None] 3.2208ms 3.1279ms 319.7000 Ops/s 310.7060 Ops/s $\color{#35bf28}+2.89\%$
test_a2c_speed[False-backward] 6.9589ms 6.3843ms 156.6353 Ops/s 160.3194 Ops/s $\color{#d91a1a}-2.30\%$
test_a2c_speed[True-None] 1.4285ms 1.3327ms 750.3419 Ops/s 732.0725 Ops/s $\color{#35bf28}+2.50\%$
test_a2c_speed[True-backward] 3.1089ms 3.0472ms 328.1668 Ops/s 337.4443 Ops/s $\color{#d91a1a}-2.75\%$
test_a2c_speed[reduce-overhead-None] 16.0794ms 9.0592ms 110.3845 Ops/s 110.4704 Ops/s $\color{#d91a1a}-0.08\%$
test_a2c_speed[reduce-overhead-backward] 1.7235ms 1.6121ms 620.3084 Ops/s 674.6687 Ops/s $\textbf{\color{#d91a1a}-8.06\%}$
test_ppo_speed[False-None] 3.8046ms 3.6425ms 274.5382 Ops/s 263.8271 Ops/s $\color{#35bf28}+4.06\%$
test_ppo_speed[False-backward] 7.4746ms 7.0606ms 141.6315 Ops/s 142.5207 Ops/s $\color{#d91a1a}-0.62\%$
test_ppo_speed[True-None] 1.8525ms 1.3980ms 715.3006 Ops/s 694.6559 Ops/s $\color{#35bf28}+2.97\%$
test_ppo_speed[True-backward] 3.6106ms 3.2115ms 311.3837 Ops/s 320.0223 Ops/s $\color{#d91a1a}-2.70\%$
test_ppo_speed[reduce-overhead-None] 1.3564ms 0.9595ms 1.0422 KOps/s 1.0311 KOps/s $\color{#35bf28}+1.07\%$
test_ppo_speed[reduce-overhead-backward] 1.9440ms 1.5712ms 636.4636 Ops/s 617.9311 Ops/s $\color{#35bf28}+3.00\%$
test_reinforce_speed[False-None] 2.3446ms 2.2392ms 446.5931 Ops/s 434.4129 Ops/s $\color{#35bf28}+2.80\%$
test_reinforce_speed[False-backward] 3.8008ms 3.4005ms 294.0717 Ops/s 288.6610 Ops/s $\color{#35bf28}+1.87\%$
test_reinforce_speed[True-None] 1.6565ms 1.2673ms 789.1072 Ops/s 771.9056 Ops/s $\color{#35bf28}+2.23\%$
test_reinforce_speed[True-backward] 3.1439ms 3.0686ms 325.8810 Ops/s 323.7319 Ops/s $\color{#35bf28}+0.66\%$
test_reinforce_speed[reduce-overhead-None] 19.0596ms 10.5440ms 94.8404 Ops/s 96.6086 Ops/s $\color{#d91a1a}-1.83\%$
test_reinforce_speed[reduce-overhead-backward] 1.7014ms 1.6290ms 613.8850 Ops/s 597.3016 Ops/s $\color{#35bf28}+2.78\%$
test_iql_speed[False-None] 9.6130ms 9.1746ms 108.9962 Ops/s 106.0912 Ops/s $\color{#35bf28}+2.74\%$
test_iql_speed[False-backward] 13.7200ms 13.2300ms 75.5857 Ops/s 73.5368 Ops/s $\color{#35bf28}+2.79\%$
test_iql_speed[True-None] 2.3083ms 2.1890ms 456.8267 Ops/s 436.9003 Ops/s $\color{#35bf28}+4.56\%$
test_iql_speed[True-backward] 5.2911ms 4.9193ms 203.2822 Ops/s 196.3924 Ops/s $\color{#35bf28}+3.51\%$
test_iql_speed[reduce-overhead-None] 0.5239s 13.0350ms 76.7163 Ops/s 89.6256 Ops/s $\textbf{\color{#d91a1a}-14.40\%}$
test_iql_speed[reduce-overhead-backward] 1.9436ms 1.8962ms 527.3680 Ops/s 482.1972 Ops/s $\textbf{\color{#35bf28}+9.37\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.6648ms 6.2862ms 159.0798 Ops/s 157.7714 Ops/s $\color{#35bf28}+0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8346ms 0.2563ms 3.9012 KOps/s 3.2575 KOps/s $\textbf{\color{#35bf28}+19.76\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6246ms 0.2369ms 4.2203 KOps/s 3.2672 KOps/s $\textbf{\color{#35bf28}+29.17\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.2680ms 6.0255ms 165.9602 Ops/s 166.6981 Ops/s $\color{#d91a1a}-0.44\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9092ms 0.2990ms 3.3449 KOps/s 3.1285 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5457ms 0.2681ms 3.7295 KOps/s 4.2340 KOps/s $\textbf{\color{#d91a1a}-11.92\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4918ms 1.3267ms 753.7529 Ops/s 719.4534 Ops/s $\color{#35bf28}+4.77\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5466ms 1.3144ms 760.8136 Ops/s 843.2122 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.0407ms 6.2280ms 160.5649 Ops/s 161.4242 Ops/s $\color{#d91a1a}-0.53\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8363ms 0.4042ms 2.4739 KOps/s 2.2731 KOps/s $\textbf{\color{#35bf28}+8.83\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6161ms 0.3831ms 2.6101 KOps/s 2.5841 KOps/s $\color{#35bf28}+1.00\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2158ms 6.0286ms 165.8766 Ops/s 163.1974 Ops/s $\color{#35bf28}+1.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8995ms 0.3444ms 2.9032 KOps/s 3.3300 KOps/s $\textbf{\color{#d91a1a}-12.82\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7052ms 0.3380ms 2.9586 KOps/s 3.6891 KOps/s $\textbf{\color{#d91a1a}-19.80\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 8.6114ms 5.9725ms 167.4347 Ops/s 165.8785 Ops/s $\color{#35bf28}+0.94\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6507ms 0.2976ms 3.3604 KOps/s 3.4495 KOps/s $\color{#d91a1a}-2.58\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5911ms 0.2732ms 3.6602 KOps/s 3.4918 KOps/s $\color{#35bf28}+4.82\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4388ms 6.1718ms 162.0275 Ops/s 161.4355 Ops/s $\color{#35bf28}+0.37\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.0755ms 0.4509ms 2.2176 KOps/s 2.0127 KOps/s $\textbf{\color{#35bf28}+10.18\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5733ms 0.4416ms 2.2646 KOps/s 2.0948 KOps/s $\textbf{\color{#35bf28}+8.11\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.0600ms 5.4687ms 182.8583 Ops/s 177.7468 Ops/s $\color{#35bf28}+2.88\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 10.3245ms 2.0816ms 480.4085 Ops/s 438.0955 Ops/s $\textbf{\color{#35bf28}+9.66\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.8856ms 1.2016ms 832.2081 Ops/s 839.2909 Ops/s $\color{#d91a1a}-0.84\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.1373ms 5.5983ms 178.6241 Ops/s 175.6328 Ops/s $\color{#35bf28}+1.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 10.3504ms 2.0869ms 479.1763 Ops/s 428.3670 Ops/s $\textbf{\color{#35bf28}+11.86\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.3643ms 1.1417ms 875.8755 Ops/s 847.4263 Ops/s $\color{#35bf28}+3.36\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5267s 16.1907ms 61.7638 Ops/s 30.4754 Ops/s $\textbf{\color{#35bf28}+102.67\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.3758ms 2.1281ms 469.8993 Ops/s 446.0570 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.5729ms 1.3499ms 740.8217 Ops/s 728.8578 Ops/s $\color{#35bf28}+1.64\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.4147ms 13.1658ms 75.9546 Ops/s 73.4624 Ops/s $\color{#35bf28}+3.39\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 18.5744ms 16.6724ms 59.9794 Ops/s 59.1282 Ops/s $\color{#35bf28}+1.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 18.2529ms 17.6286ms 56.7259 Ops/s 54.4143 Ops/s $\color{#35bf28}+4.25\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.3563ms 17.4210ms 57.4021 Ops/s 58.5202 Ops/s $\color{#d91a1a}-1.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 18.0318ms 17.7589ms 56.3099 Ops/s 55.2788 Ops/s $\color{#35bf28}+1.87\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 19.8376ms 18.3641ms 54.4542 Ops/s 54.4698 Ops/s $\color{#d91a1a}-0.03\%$

@vmoens vmoens merged commit d36a97e into gh/vmoens/100/base Mar 11, 2025
58 of 72 checks passed
vmoens added a commit that referenced this pull request Mar 11, 2025
ghstack-source-id: ef6a0f52601642c8944f63f9e3ac9e963425734e
Pull Request resolved: #2823
@vmoens vmoens deleted the gh/vmoens/100/head branch March 11, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants