Use stream pool for gather/scatter. #14162

bdice · 2023-09-21T20:02:15Z

Description

This PR uses the stream pool introduced in #13922 to gather/scatter each column in a table on a separate stream.

Related: #13509, which this might resolve (need to verify).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

bdice · 2023-09-22T00:42:10Z

Gather (scatter) summary:

Slightly slower for 1 column and less than 16k rows -- we might accept this for speedups in other areas.
Faster by about 20-30% for 2-8 columns at medium data sizes.
No impact for large data sizes because the GPU is saturated.
Overall 7.97% (5.86%) faster geometric mean across all benchmarks, but this is dragged down by the number of large benchmarks where saturation is achieved.
⚠️ Need to test this for a huge number of columns (perhaps 32, 256, 2048 columns with fewer rows).
⚠️ Need to investigate the scatter slowdown for larger numbers of columns (I haven't profiled scatter yet, only gather).

Comparing logs/baseline/GATHER_BENCH.json to logs/change/GATHER_BENCH.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
Gather/double_coalesce_x/1024/1/manual_time                    +0.0670         +0.0131          3903          4165         20751         21022
Gather/double_coalesce_x/2048/1/manual_time                    +0.1678         +0.0303          3878          4529         20590         21213
Gather/double_coalesce_x/4096/1/manual_time                    +0.1684         +0.0358          4021          4698         20826         21572
Gather/double_coalesce_x/8192/1/manual_time                    +0.0648         +0.0111          4474          4764         21036         21270
Gather/double_coalesce_x/16384/1/manual_time                   -0.0056         +0.0009          5015          4987         21527         21547
Gather/double_coalesce_x/32768/1/manual_time                   +0.0086         +0.0042          5193          5238         21470         21560
Gather/double_coalesce_x/65536/1/manual_time                   -0.0030         +0.0035          6165          6146         22016         22093
Gather/double_coalesce_x/131072/1/manual_time                  -0.0062         -0.0011          8158          8108         23180         23153
Gather/double_coalesce_x/262144/1/manual_time                  +0.0058         -0.0031         12407         12479         26424         26343
Gather/double_coalesce_x/524288/1/manual_time                  -0.0013         -0.0002         19192         19167         32438         32431
Gather/double_coalesce_x/1048576/1/manual_time                 -0.0010         +0.0009         34219         34186         50726         50772
Gather/double_coalesce_x/2097152/1/manual_time                 +0.0015         +0.0002         64927         65028         81801         81815
Gather/double_coalesce_x/4194304/1/manual_time                 +0.0019         +0.0024        126251        126488        143246        143588
Gather/double_coalesce_x/8388608/1/manual_time                 -0.0008         -0.0049        249209        248999        267544        266233
Gather/double_coalesce_x/16777216/1/manual_time                +0.0002         +0.0006        494346        494424        511780        512110
Gather/double_coalesce_x/33554432/1/manual_time                -0.0007         -0.0006        984511        983807       1003414       1002849
Gather/double_coalesce_x/67108864/1/manual_time                -0.0006         -0.0003       1965749       1964574       1984850       1984332
Gather/double_coalesce_x/1024/2/manual_time                    -0.1281         +0.0269          6722          5861         23246         23872
Gather/double_coalesce_x/2048/2/manual_time                    -0.1234         +0.0328          6646          5826         23074         23831
Gather/double_coalesce_x/4096/2/manual_time                    -0.1649         +0.0187          7159          5978         23470         23909
Gather/double_coalesce_x/8192/2/manual_time                    -0.2611         -0.0185          8228          6080         24488         24034
Gather/double_coalesce_x/16384/2/manual_time                   -0.2433         -0.0006          8340          6311         24280         24265
Gather/double_coalesce_x/32768/2/manual_time                   -0.2053         +0.0158          8737          6943         24348         24734
Gather/double_coalesce_x/65536/2/manual_time                   -0.1630         +0.0571         10056          8416         25085         26517
Gather/double_coalesce_x/131072/2/manual_time                  -0.1089         +0.0968         13292         11845         27333         29980
Gather/double_coalesce_x/262144/2/manual_time                  -0.0777         +0.0905         21871         20171         35097         38272
Gather/double_coalesce_x/524288/2/manual_time                  -0.0298         +0.0127         37337         36226         53576         54258
Gather/double_coalesce_x/1048576/2/manual_time                 -0.0249         +0.0005         68477         66771         84927         84966
Gather/double_coalesce_x/2097152/2/manual_time                 -0.0129         -0.0005        129822        128148        146507        146441
Gather/double_coalesce_x/4194304/2/manual_time                 -0.0088         -0.0026        252911        250687        269673        268979
Gather/double_coalesce_x/8388608/2/manual_time                 -0.0036         +0.0037        497851        496067        514684        516589
Gather/double_coalesce_x/16777216/2/manual_time                -0.0020         -0.0004        988297        986366       1005175       1004818
Gather/double_coalesce_x/33554432/2/manual_time                -0.0014         -0.0003       1969370       1966611       1986153       1985626
Gather/double_coalesce_x/67108864/2/manual_time                -0.0007         -0.0003       3930014       3927199       3947221       3945966
Gather/double_coalesce_x/1024/4/manual_time                    -0.1376         +0.0118         11408          9838         27707         28034
Gather/double_coalesce_x/2048/4/manual_time                    -0.1538         +0.0038         11646          9855         27896         28001
Gather/double_coalesce_x/4096/4/manual_time                    -0.2697         -0.0554         13595          9928         29744         28095
Gather/double_coalesce_x/8192/4/manual_time                    -0.2744         -0.0524         13873         10066         29751         28193
Gather/double_coalesce_x/16384/4/manual_time                   -0.2739         -0.0441         14155         10279         29706         28398
Gather/double_coalesce_x/32768/4/manual_time                   -0.2842         -0.0332         14868         10643         29859         28868
Gather/double_coalesce_x/65536/4/manual_time                   -0.2421         -0.0085         17275         13092         31637         31369
Gather/double_coalesce_x/131072/4/manual_time                  -0.1189         +0.0699         22729         20027         35780         38281
Gather/double_coalesce_x/262144/4/manual_time                  -0.1307         -0.0625         41204         35820         57652         54051
Gather/double_coalesce_x/524288/4/manual_time                  -0.0819         -0.0465         74072         68006         90437         86235
Gather/double_coalesce_x/1048576/4/manual_time                 -0.0475         -0.0307        135987        129530        152572        147895
Gather/double_coalesce_x/2097152/4/manual_time                 -0.0305         -0.0234        259979        252053        276855        270375
Gather/double_coalesce_x/4194304/4/manual_time                 -0.0162         -0.0125        505386        497200        522287        515745
Gather/double_coalesce_x/8388608/4/manual_time                 -0.0084         -0.0065        995906        987553       1012856       1006241
Gather/double_coalesce_x/16777216/4/manual_time                -0.0044         -0.0035       1976418       1967806       1993541       1986521
Gather/double_coalesce_x/33554432/4/manual_time                -0.0026         -0.0021       3938160       3928084       3955263       3946936
Gather/double_coalesce_x/67108864/4/manual_time                -0.0018         -0.0017       7859822       7845412       7876593       7863486
Gather/double_coalesce_x/1024/8/manual_time                    +0.0836         +0.0979         20834         22577         37085         40715
Gather/double_coalesce_x/2048/8/manual_time                    -0.0503         +0.0177         23886         22684         40063         40773
Gather/double_coalesce_x/4096/8/manual_time                    -0.0763         +0.0036         24770         22880         40838         40984
Gather/double_coalesce_x/8192/8/manual_time                    -0.0631         +0.0226         24671         23114         40250         41159
Gather/double_coalesce_x/16384/8/manual_time                   -0.1122         +0.0116         25889         22983         40645         41117
Gather/double_coalesce_x/32768/8/manual_time                   -0.1584         +0.0064         27196         22887         40956         41216
Gather/double_coalesce_x/65536/8/manual_time                   -0.2652         -0.0722         31916         23452         45111         41854
Gather/double_coalesce_x/131072/8/manual_time                  -0.2487         -0.1581         42949         32269         59875         50409
Gather/double_coalesce_x/262144/8/manual_time                  -0.1992         -0.1471         80349         64345         96843         82595
Gather/double_coalesce_x/524288/8/manual_time                  -0.1233         -0.0993        147473        129294        163896        147614
Gather/double_coalesce_x/1048576/8/manual_time                 -0.0725         -0.0615        271556        251865        288006        270292
Gather/double_coalesce_x/2097152/8/manual_time                 -0.0445         -0.0402        520199        497073        537205        515622
Gather/double_coalesce_x/4194304/8/manual_time                 -0.0223         -0.0202       1009863        987294       1026743       1006004
Gather/double_coalesce_x/8388608/8/manual_time                 -0.0118         -0.0109       1991455       1967872       2008369       1986516
Gather/double_coalesce_x/16777216/8/manual_time                -0.0065         -0.0060       3953222       3927598       3970080       3946201
Gather/double_coalesce_x/33554432/8/manual_time                -0.0037         -0.0034       7876345       7847449       7893663       7866489
Gather/double_coalesce_x/67108864/8/manual_time                -0.0021         -0.0020      15720152      15687010      15736987      15704779
Gather/double_coalesce_o/1024/1/manual_time                    +0.0080         +0.0010          3975          4007         20804         20826
Gather/double_coalesce_o/2048/1/manual_time                    -0.0494         -0.0147          4213          4005         21149         20838
Gather/double_coalesce_o/4096/1/manual_time                    -0.0264         -0.0062          4149          4040         20839         20711
Gather/double_coalesce_o/8192/1/manual_time                    -0.1292         -0.0303          4656          4055         21356         20708
Gather/double_coalesce_o/16384/1/manual_time                   -0.0098         -0.0096          5243          5191         21768         21559
Gather/double_coalesce_o/32768/1/manual_time                   +0.0248         +0.0036          5528          5665         21777         21856
Gather/double_coalesce_o/65536/1/manual_time                   +0.0375         +0.0110          6488          6731         22200         22443
Gather/double_coalesce_o/131072/1/manual_time                  +0.0145         -0.0044          8614          8739         23620         23516
Gather/double_coalesce_o/262144/1/manual_time                  -0.0044         -0.0004         12961         12904         27151         27140
Gather/double_coalesce_o/524288/1/manual_time                  -0.0061         +0.0003         27470         27303         41077         41091
Gather/double_coalesce_o/1048576/1/manual_time                 +0.0003         -0.0002         68052         68070         82363         82344
Gather/double_coalesce_o/2097152/1/manual_time                 -0.0050         -0.0043        218879        217778        233054        232059
Gather/double_coalesce_o/4194304/1/manual_time                 -0.0037         -0.0031        550837        548782        564968        563193
Gather/double_coalesce_o/8388608/1/manual_time                 -0.0027         -0.0026       1228701       1225391       1242981       1239698
Gather/double_coalesce_o/16777216/1/manual_time                -0.0031         -0.0030       2589560       2581614       2603891       2596023
Gather/double_coalesce_o/33554432/1/manual_time                -0.0034         -0.0038       5318552       5300614       5334801       5314724
Gather/double_coalesce_o/67108864/1/manual_time                -0.0002         -0.0003      10773827      10771152      10788505      10785368
Gather/double_coalesce_o/1024/2/manual_time                    -0.2035         -0.0140          6947          5533         23445         23116
Gather/double_coalesce_o/2048/2/manual_time                    -0.1483         +0.0053          6981          5945         23414         23538
Gather/double_coalesce_o/4096/2/manual_time                    -0.1738         +0.0237          7504          6200         23714         24277
Gather/double_coalesce_o/8192/2/manual_time                    -0.2511         -0.0115          8465          6340         24683         24398
Gather/double_coalesce_o/16384/2/manual_time                   -0.2260         +0.0065          8681          6720         24508         24666
Gather/double_coalesce_o/32768/2/manual_time                   -0.2322         +0.0132          9362          7188         24901         25228
Gather/double_coalesce_o/65536/2/manual_time                   -0.1953         +0.0450         10648          8569         25609         26763
Gather/double_coalesce_o/131072/2/manual_time                  -0.1175         +0.0968         13846         12220         27847         30543
Gather/double_coalesce_o/262144/2/manual_time                  -0.0930         +0.0751         23361         21188         36788         39552
Gather/double_coalesce_o/524288/2/manual_time                  -0.0392         -0.0075         50522         48540         65956         65464
Gather/double_coalesce_o/1048576/2/manual_time                 +0.0093         +0.0185        130599        131811        145127        147807
Gather/double_coalesce_o/2097152/2/manual_time                 -0.0051         -0.0009        432564        430379        446345        445938
Gather/double_coalesce_o/4194304/2/manual_time                 -0.0054         -0.0048       1097734       1091792       1112005       1106696
Gather/double_coalesce_o/8388608/2/manual_time                 -0.0044         -0.0040       2452606       2441875       2466513       2456559
Gather/double_coalesce_o/16777216/2/manual_time                -0.0039         -0.0038       5174053       5154090       5188301       5168468
Gather/double_coalesce_o/33554432/2/manual_time                -0.0002         -0.0001      10626420      10624636      10640781      10639286
Gather/double_coalesce_o/67108864/2/manual_time                -0.0004         -0.0004      21543928      21534590      21558438      21548902
Gather/double_coalesce_o/1024/4/manual_time                    -0.1426         +0.0116         11723         10051         28040         28364
Gather/double_coalesce_o/2048/4/manual_time                    -0.1718         -0.0019         12244         10141         28409         28355
Gather/double_coalesce_o/4096/4/manual_time                    -0.2878         -0.0655         14310         10192         30340         28351
Gather/double_coalesce_o/8192/4/manual_time                    -0.2854         -0.0615         14508         10368         30258         28398
Gather/double_coalesce_o/16384/4/manual_time                   -0.2865         -0.0518         14744         10520         30172         28609
Gather/double_coalesce_o/32768/4/manual_time                   -0.3318         -0.0674         16045         10722         30980         28893
Gather/double_coalesce_o/65536/4/manual_time                   -0.2723         -0.0274         18348         13353         32659         31763
Gather/double_coalesce_o/131072/4/manual_time                  -0.1382         +0.0432         24497         21111         37913         39549
Gather/double_coalesce_o/262144/4/manual_time                  -0.1136         -0.0535         43511         38568         60108         56891
Gather/double_coalesce_o/524288/4/manual_time                  -0.0697         -0.0473         97874         91055        113276        107915
Gather/double_coalesce_o/1048576/4/manual_time                 +0.0182         +0.0213        255297        259949        269853        275603
Gather/double_coalesce_o/2097152/4/manual_time                 -0.0050         -0.0047        859352        855050        873427        869357
Gather/double_coalesce_o/4194304/4/manual_time                 -0.0062         -0.0060       2189672       2176114       2203604       2190477
Gather/double_coalesce_o/8388608/4/manual_time                 -0.0051         -0.0050       4900735       4875881       4914981       4890328
Gather/double_coalesce_o/16777216/4/manual_time                -0.0045         -0.0045      10342561      10296190      10357039      10310363
Gather/double_coalesce_o/33554432/4/manual_time                -0.0005         -0.0004      21245305      21235522      21258828      21249432
Gather/double_coalesce_o/67108864/4/manual_time                -0.0005         -0.0005      43073740      43053012      43087824      43065399
Gather/double_coalesce_o/1024/8/manual_time                    +0.0362         +0.0735         21754         22542         37922         40710
Gather/double_coalesce_o/2048/8/manual_time                    -0.0996         -0.0105         25198         22689         41190         40758
Gather/double_coalesce_o/4096/8/manual_time                    -0.1239         -0.0265         26018         22795         41818         40710
Gather/double_coalesce_o/8192/8/manual_time                    -0.1222         -0.0165         26032         22850         41470         40787
Gather/double_coalesce_o/16384/8/manual_time                   -0.1454         -0.0117         26888         22979         41602         41117
Gather/double_coalesce_o/32768/8/manual_time                   -0.2161         -0.0462         29129         22833         43075         41085
Gather/double_coalesce_o/65536/8/manual_time                   -0.3164         -0.1245         34355         23486         47548         41626
Gather/double_coalesce_o/131072/8/manual_time                  -0.2447         -0.1501         45124         34081         61702         52438
Gather/double_coalesce_o/262144/8/manual_time                  -0.1789         -0.1326         85242         69991        101861         88358
Gather/double_coalesce_o/524288/8/manual_time                  -0.0738         -0.0650        192378        178188        207799        194298
Gather/double_coalesce_o/1048576/8/manual_time                 +0.0231         +0.0222        504960        516625        519529        531040
Gather/double_coalesce_o/2097152/8/manual_time                 -0.0052         -0.0048       1712766       1703821       1726521       1718296
Gather/double_coalesce_o/4194304/8/manual_time                 -0.0069         -0.0069       4376833       4346497       4391061       4360782
Gather/double_coalesce_o/8388608/8/manual_time                 -0.0057         -0.0056       9794185       9738560       9807964       9753149
Gather/double_coalesce_o/16777216/8/manual_time                -0.0047         -0.0046      20678680      20581819      20691916      20596033
Gather/double_coalesce_o/33554432/8/manual_time                -0.0007         -0.0007      42483332      42454405      42496322      42468296
Gather/double_coalesce_o/67108864/8/manual_time                -0.0036         -0.0036      86118399      85809260      86133618      85823227
OVERALL_GEOMEAN                                                -0.0797         -0.0082             0             0             0             0

Comparing logs/baseline/SCATTER_BENCH.json to logs/change/SCATTER_BENCH.json
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
Scatter/double_coalesce_x/1024/1/manual_time                    -0.0412         -0.0078          5859          5617         22226         22053
Scatter/double_coalesce_x/2048/1/manual_time                    -0.0372         -0.0059          5881          5662         22177         22047
Scatter/double_coalesce_x/4096/1/manual_time                    -0.0278         -0.0026          6193          6021         22415         22357
Scatter/double_coalesce_x/8192/1/manual_time                    -0.0378         -0.0053          7071          6804         23222         23099
Scatter/double_coalesce_x/16384/1/manual_time                   -0.0254         -0.0099          7173          6990         23283         23052
Scatter/double_coalesce_x/32768/1/manual_time                   -0.0044         -0.0023          7995          7960         23778         23723
Scatter/double_coalesce_x/65536/1/manual_time                   -0.0224         -0.0025          9510          9298         24795         24733
Scatter/double_coalesce_x/131072/1/manual_time                  -0.0040         +0.0019         12325         12276         26945         26995
Scatter/double_coalesce_x/262144/1/manual_time                  -0.0016         -0.0010         19489         19457         33001         32969
Scatter/double_coalesce_x/524288/1/manual_time                  +0.0003         +0.0033         31224         31233         44848         44995
Scatter/double_coalesce_x/1048576/1/manual_time                 +0.0009         +0.0027         58923         58978         72715         72910
Scatter/double_coalesce_x/2097152/1/manual_time                 +0.0004         -0.0007        114391        114440        128069        127978
Scatter/double_coalesce_x/4194304/1/manual_time                 +0.0003         +0.0008        224921        224986        238844        239032
Scatter/double_coalesce_x/8388608/1/manual_time                 +0.0001         +0.0005        445895        445922        460009        460232
Scatter/double_coalesce_x/16777216/1/manual_time                +0.0001         +0.0001        887232        887288        902074        902181
Scatter/double_coalesce_x/33554432/1/manual_time                +0.0000         +0.0001       1770463       1770476       1785968       1786060
Scatter/double_coalesce_x/1024/2/manual_time                    -0.1800         -0.0441          9655          7917         25941         24798
Scatter/double_coalesce_x/2048/2/manual_time                    -0.2116         -0.0587          9928          7827         26158         24623
Scatter/double_coalesce_x/4096/2/manual_time                    -0.3039         -0.1088         11360          7908         27483         24493
Scatter/double_coalesce_x/8192/2/manual_time                    -0.2901         -0.1060         11541          8193         27550         24629
Scatter/double_coalesce_x/16384/2/manual_time                   -0.2812         -0.0964         11873          8534         27543         24888
Scatter/double_coalesce_x/32768/2/manual_time                   -0.2997         -0.1022         13363          9359         28689         25756
Scatter/double_coalesce_x/65536/2/manual_time                   -0.2478         -0.0610         15673         11789         30170         28330
Scatter/double_coalesce_x/131072/2/manual_time                  -0.1692         -0.0265         21168         17587         34829         33905
Scatter/double_coalesce_x/262144/2/manual_time                  -0.0853         +0.0110         35241         32233         49083         49624
Scatter/double_coalesce_x/524288/2/manual_time                  +0.0119         +0.0287         60636         61358         76186         78375
Scatter/double_coalesce_x/1048576/2/manual_time                 +0.0100         +0.0136        115711        116864        132413        134220
Scatter/double_coalesce_x/2097152/2/manual_time                 +0.0026         +0.0066        226632        227230        243607        245214
Scatter/double_coalesce_x/4194304/2/manual_time                 -0.0006         +0.0015        447908        447661        465333        466051
Scatter/double_coalesce_x/8388608/2/manual_time                 -0.0009         -0.0010        889860        889020        908906        908029
Scatter/double_coalesce_x/16777216/2/manual_time                -0.0011         +0.0001       1773307       1771436       1790458       1790549
Scatter/double_coalesce_x/33554432/2/manual_time                +0.0014         +0.0017       3536580       3541693       3553869       3559935
Scatter/double_coalesce_x/1024/4/manual_time                    +0.0989         +0.0658         17747         19502         33916         36149
Scatter/double_coalesce_x/2048/4/manual_time                    -0.0136         +0.0068         19938         19667         36005         36251
Scatter/double_coalesce_x/4096/4/manual_time                    -0.0315         -0.0036         20408         19766         36361         36231
Scatter/double_coalesce_x/8192/4/manual_time                    -0.0455         -0.0082         20820         19873         36498         36198
Scatter/double_coalesce_x/16384/4/manual_time                   -0.0887         -0.0226         21619         19701         36882         36050
Scatter/double_coalesce_x/32768/4/manual_time                   -0.1645         -0.0527         23561         19687         38208         36193
Scatter/double_coalesce_x/65536/4/manual_time                   -0.2860         -0.1282         28526         20368         41948         36569
Scatter/double_coalesce_x/131072/4/manual_time                  -0.2051         -0.0855         38749         30801         51747         47324
Scatter/double_coalesce_x/262144/4/manual_time                  -0.1266         -0.0899         67519         58971         83381         75885
Scatter/double_coalesce_x/524288/4/manual_time                  -0.0335         -0.0290        120854        116802        137617        133620
Scatter/double_coalesce_x/1048576/4/manual_time                 -0.0171         -0.0137        231301        227342        247795        244395
Scatter/double_coalesce_x/2097152/4/manual_time                 -0.0111         -0.0096        452791        447757        469444        464930
Scatter/double_coalesce_x/4194304/4/manual_time                 -0.0070         -0.0065        895369        889074        912238        906344
Scatter/double_coalesce_x/8388608/4/manual_time                 -0.0048         -0.0047       1780658       1772161       1797833       1789471
Scatter/double_coalesce_x/16777216/4/manual_time                -0.0023         -0.0026       3546138       3537902       3564610       3555208
Scatter/double_coalesce_x/33554432/4/manual_time                -0.0050         -0.0058       7106348       7070658       7129608       7087916
Scatter/double_coalesce_x/1024/8/manual_time                    +0.1633         +0.1230         37326         43420         53418         59989
Scatter/double_coalesce_x/2048/8/manual_time                    +0.1484         +0.1139         38038         43684         53996         60144
Scatter/double_coalesce_x/4096/8/manual_time                    +0.1564         +0.1220         38243         44224         53974         60560
Scatter/double_coalesce_x/8192/8/manual_time                    +0.1095         +0.0988         39702         44048         55019         60456
Scatter/double_coalesce_x/16384/8/manual_time                   +0.0736         +0.0918         40944         43959         55397         60481
Scatter/double_coalesce_x/32768/8/manual_time                   -0.0035         +0.0446         44580         44426         57909         60494
Scatter/double_coalesce_x/65536/8/manual_time                   -0.1643         -0.0923         53777         44943         66914         60739
Scatter/double_coalesce_x/131072/8/manual_time                  -0.2658         -0.2136         72694         53369         88922         69926
Scatter/double_coalesce_x/262144/8/manual_time                  -0.1394         -0.1187        132476        114009        148604        130961
Scatter/double_coalesce_x/524288/8/manual_time                  -0.0535         -0.0494        240561        227697        257319        244608
Scatter/double_coalesce_x/1048576/8/manual_time                 -0.0282         -0.0262        461394        448389        477958        465438
Scatter/double_coalesce_x/2097152/8/manual_time                 -0.0166         -0.0158        904785        889740        921504        906980
Scatter/double_coalesce_x/4194304/8/manual_time                 -0.0103         -0.0103       1791397       1772908       1808663       1790056
Scatter/double_coalesce_x/8388608/8/manual_time                 -0.0055         -0.0054       3557739       3538341       3575065       3555749
Scatter/double_coalesce_x/16777216/8/manual_time                -0.0026         -0.0026       7090149       7071573       7107450       7089029
Scatter/double_coalesce_x/33554432/8/manual_time                -0.0009         -0.0008      14147773      14135727      14164724      14153219
Scatter/double_coalesce_o/1024/1/manual_time                    +0.0050         +0.0153          6278          6309         22608         22953
Scatter/double_coalesce_o/2048/1/manual_time                    +0.0249         +0.0228          6374          6533         22617         23134
Scatter/double_coalesce_o/4096/1/manual_time                    +0.0494         +0.0277          6679          7009         22871         23504
Scatter/double_coalesce_o/8192/1/manual_time                    -0.0243         +0.0073          7437          7257         23456         23627
Scatter/double_coalesce_o/16384/1/manual_time                   -0.0024         +0.0125          7552          7534         23470         23764
Scatter/double_coalesce_o/32768/1/manual_time                   -0.0100         +0.0156          8539          8453         24311         24689
Scatter/double_coalesce_o/65536/1/manual_time                   -0.0163         +0.0360         10369         10200         25667         26591
Scatter/double_coalesce_o/131072/1/manual_time                  -0.0179         +0.0556         14459         14200         29073         30691
Scatter/double_coalesce_o/262144/1/manual_time                  +0.0104         +0.0745         22166         22396         35824         38492
Scatter/double_coalesce_o/524288/1/manual_time                  +0.0029         +0.0077         40504         40621         54003         54419
Scatter/double_coalesce_o/1048576/1/manual_time                 +0.0002         +0.0013        122119        122141        136456        136637
Scatter/double_coalesce_o/2097152/1/manual_time                 +0.0056         +0.0075        300191        301882        315146        317494
Scatter/double_coalesce_o/4194304/1/manual_time                 +0.0033         +0.0039        706413        708722        721748        724573
Scatter/double_coalesce_o/8388608/1/manual_time                 +0.0040         +0.0039       1612035       1618487       1628311       1634673
Scatter/double_coalesce_o/16777216/1/manual_time                -0.0033         -0.0032       3438089       3426652       3454268       3443314
Scatter/double_coalesce_o/33554432/1/manual_time                +0.0001         +0.0002       7238655       7239599       7254848       7256065
Scatter/double_coalesce_o/1024/2/manual_time                    -0.2522         -0.0793         10557          7895         26860         24730
Scatter/double_coalesce_o/2048/2/manual_time                    -0.2667         -0.0891         10976          8049         27206         24782
Scatter/double_coalesce_o/4096/2/manual_time                    -0.3294         -0.1231         12115          8125         28148         24684
Scatter/double_coalesce_o/8192/2/manual_time                    -0.3149         -0.1182         12304          8430         28190         24859
Scatter/double_coalesce_o/16384/2/manual_time                   -0.2800         -0.0996         12507          9005         28174         25367
Scatter/double_coalesce_o/32768/2/manual_time                   -0.2728         -0.0938         14278         10383         29587         26811
Scatter/double_coalesce_o/65536/2/manual_time                   -0.1870         -0.0415         17649         14348         32255         30916
Scatter/double_coalesce_o/131072/2/manual_time                  -0.1509         -0.0307         25436         21597         39047         37847
Scatter/double_coalesce_o/262144/2/manual_time                  -0.0745         -0.0034         40293         37292         54220         54034
Scatter/double_coalesce_o/524288/2/manual_time                  +0.0722         +0.0685         77704         83313         92477         98816
Scatter/double_coalesce_o/1048576/2/manual_time                 +0.0141         +0.0143        241342        244735        256556        260217
Scatter/double_coalesce_o/2097152/2/manual_time                 -0.0013         -0.0010        598623        597828        614078        613457
Scatter/double_coalesce_o/4194304/2/manual_time                 -0.0032         -0.0032       1412437       1407919       1428564       1423962
Scatter/double_coalesce_o/8388608/2/manual_time                 -0.0009         -0.0008       3193489       3190464       3209560       3206928
Scatter/double_coalesce_o/16777216/2/manual_time                -0.0007         -0.0007       6862479       6857536       6878643       6874086
Scatter/double_coalesce_o/33554432/2/manual_time                +0.0002         +0.0002      14471446      14473895      14487453      14490239
Scatter/double_coalesce_o/1024/4/manual_time                    +0.0402         +0.0368         19329         20106         35488         36794
Scatter/double_coalesce_o/2048/4/manual_time                    -0.0514         -0.0154         21207         20118         37269         36695
Scatter/double_coalesce_o/4096/4/manual_time                    -0.0748         -0.0294         21525         19915         37447         36347
Scatter/double_coalesce_o/8192/4/manual_time                    -0.0778         -0.0282         21874         20172         37589         36528
Scatter/double_coalesce_o/16384/4/manual_time                   -0.1083         -0.0348         22561         20118         37863         36544
Scatter/double_coalesce_o/32768/4/manual_time                   -0.2086         -0.0855         25159         19911         39852         36444
Scatter/double_coalesce_o/65536/4/manual_time                   -0.2759         -0.1355         32291         23382         45715         39521
Scatter/double_coalesce_o/131072/4/manual_time                  -0.2181         -0.1083         46579         36419         59737         53267
Scatter/double_coalesce_o/262144/4/manual_time                  -0.1175         -0.0903         77058         68003         93085         84678
Scatter/double_coalesce_o/524288/4/manual_time                  +0.0748         +0.0694        153091        164547        168282        179965
Scatter/double_coalesce_o/1048576/4/manual_time                 +0.0107         +0.0108        480344        485487        495602        500965
Scatter/double_coalesce_o/2097152/4/manual_time                 -0.0031         -0.0028       1195348       1191686       1210857       1207409
Scatter/double_coalesce_o/4194304/4/manual_time                 -0.0034         -0.0033       2820736       2811095       2836698       2827417
Scatter/double_coalesce_o/8388608/4/manual_time                 -0.0035         -0.0034       6396894       6374505       6413043       6390978
Scatter/double_coalesce_o/16777216/4/manual_time                -0.0002         +0.0001      13714003      13711082      13729877      13731288
Scatter/double_coalesce_o/33554432/4/manual_time                +0.0002         +0.0002      28940269      28945677      28956369      28962263
Scatter/double_coalesce_o/1024/8/manual_time                    +0.1089         +0.0863         39621         43937         55703         60508
Scatter/double_coalesce_o/2048/8/manual_time                    +0.0886         +0.0751         40146         43704         55989         60192
Scatter/double_coalesce_o/4096/8/manual_time                    +0.0909         +0.0774         40327         43994         56039         60376
Scatter/double_coalesce_o/8192/8/manual_time                    +0.0566         +0.0620         41778         44143         57095         60636
Scatter/double_coalesce_o/16384/8/manual_time                   +0.0239         +0.0523         43015         44045         57628         60640
Scatter/double_coalesce_o/32768/8/manual_time                   -0.0712         -0.0107         48128         44701         61515         60859
Scatter/double_coalesce_o/65536/8/manual_time                   -0.2621         -0.1777         62219         45914         75431         62028
Scatter/double_coalesce_o/131072/8/manual_time                  -0.2860         -0.2336         88560         63236        104669         80220
Scatter/double_coalesce_o/262144/8/manual_time                  -0.1404         -0.1233        151664        130366        167915        147210
Scatter/double_coalesce_o/524288/8/manual_time                  +0.0627         +0.0602        303769        322809        318970        338187
Scatter/double_coalesce_o/1048576/8/manual_time                 +0.0081         +0.0079        957532        965258        972949        980627
Scatter/double_coalesce_o/2097152/8/manual_time                 -0.0036         -0.0035       2387985       2379473       2403619       2395248
Scatter/double_coalesce_o/4194304/8/manual_time                 -0.0041         -0.0040       5643118       5619758       5658857       5636088
Scatter/double_coalesce_o/8388608/8/manual_time                 -0.0002         -0.0001      12751803      12749747      12767583      12766072
Scatter/double_coalesce_o/16777216/8/manual_time                -0.0013         -0.0013      27455999      27420519      27471449      27436756
Scatter/double_coalesce_o/33554432/8/manual_time                +0.0006         +0.0006      57852666      57887857      57867500      57900989
OVERALL_GEOMEAN                                                 -0.0586         -0.0157             0             0             0             0

nvdbaranec · 2023-09-29T21:58:00Z

This is probably not going to do much for the listed issue. The issue here is the number of raw thrust and kernel calls stemming from nesting (think thousands of columns underneath the top level table). The fix for that is going to be smarter parallelization of what is currently the recursive cpu-side approach. We have some ideas here.

abellina · 2023-09-29T21:58:01Z

cpp/include/cudf/detail/gather.cuh

+  // only a single column, the fork/join overhead should be avoided.
+  auto streams = std::vector<rmm::cuda_stream_view>{};
+  if (num_columns > 1) {
+    streams = cudf::detail::fork_streams(stream, num_columns);


I am curious how this works for per-thread default stream. For spark, we build cuDF with PTDS. Will streams will be have number-of-columns vector of the PTDS stream?

The stream passed in would be the PTDS stream. Then a stream pool (for that thread) would be created (or reused), the work would be executed across that stream pool, and then the join step would insert events for all the elements of streams to be synchronized with stream before new work on stream (the PTDS stream in Spark's case) would be runnable.

copy-pr-bot · 2025-03-10T17:37:13Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpp/include/cudf/detail/gather.cuh

bdice · 2025-03-10T17:53:07Z

@mythrocks Could you pick this up? I just merged some changes that pull it up-to-date with branch-25.04. I think it should build but I did commit a few changes just now without checking whether it builds. If you can benchmark this, that'd be a helpful step in moving it forward.

mythrocks · 2025-03-10T18:32:04Z

@mythrocks Could you pick this up?

I'm on it, mate. Thank you for this change. I'll post here with any findings.

cpp/include/cudf/detail/scatter.cuh

vuule · 2025-03-10T20:43:40Z

cpp/include/cudf/detail/gather.cuh

+  // only a single column, the fork/join overhead should be avoided.
+  auto streams = std::vector<rmm::cuda_stream_view>{};
+  if (num_columns > 1) {
+    streams = cudf::detail::fork_streams(stream, num_columns);


this will emit a warning when there are more columns than streams in the pool. We can remove with warning; otherwise it's probably good to limit the number of streams to the pool size.

Is there a good/recommended number? And is it tuned by hand or automatically computed?

It's fine to use all streams in the pool, I don't think we need to tune this.
but getting more reuses them in round-robin fashion and we have a warning because maybe it's an unexpected behavior for users.

I knew we did round-robin, I did not know we issued a warning. I think removing the warning would be appropriate.

opened #18236

vuule · 2025-03-10T21:19:23Z

cpp/include/cudf/detail/gather.cuh


+  auto it = thrust::make_counting_iterator<size_type>(0);
+
+  std::transform(it, it + num_columns, result.begin(), [&](size_type i) {


thrust::tabulate saves a bit of code here

cpp/include/cudf/detail/scatter.cuh

mythrocks · 2025-03-24T23:01:52Z

I'll retarget this to 25.06. This will be good to have, but it didn't help NVIDIA/spark-rapids#12195. It simply wasn't the slow path, in this case. :/

vyasr · 2025-05-19T17:05:56Z

Moved to 25.08. @mythrocks said that he should be able to get this done for the next release.

bdice · 2025-07-29T15:52:01Z

@mythrocks Any interest in picking this up? Rebenchmarking is probably the first step.

mythrocks · 2025-07-31T19:52:40Z

Sorry, I've been sitting on this for a while now, and holding up progress. I'd better relinquish, in case someone else is able to run with it.

copy-pr-bot · 2026-01-23T15:26:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

review-notebook-app · 2026-01-23T15:27:01Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 21, 2023

abellina reviewed Sep 29, 2023

View reviewed changes

GregoryKimball requested a review from mythrocks March 10, 2025 17:37

GregoryKimball assigned mythrocks Mar 10, 2025

bdice changed the base branch from branch-23.10 to branch-25.04 March 10, 2025 17:37

github-actions bot assigned bdice Mar 10, 2025

bdice commented Mar 10, 2025

View reviewed changes

cpp/include/cudf/detail/gather.cuh Outdated Show resolved Hide resolved

bdice marked this pull request as ready for review March 10, 2025 17:53

bdice requested a review from a team as a code owner March 10, 2025 17:53

bdice requested a review from ttnghia March 10, 2025 17:53

bdice commented Mar 10, 2025

View reviewed changes

cpp/include/cudf/detail/scatter.cuh Outdated Show resolved Hide resolved

vuule reviewed Mar 10, 2025

View reviewed changes

mythrocks reviewed Mar 11, 2025

View reviewed changes

cpp/include/cudf/detail/scatter.cuh Outdated Show resolved Hide resolved

ttnghia approved these changes Mar 17, 2025

View reviewed changes

mythrocks mentioned this pull request Mar 24, 2025

[BUG] Wide schema read with operations falling back to the CPU show serious query slowdown NVIDIA/spark-rapids#12195

Open

bdice mentioned this pull request Apr 9, 2025

Add a public API for copying a table_view to device array #18450

Merged

3 tasks

vyasr changed the base branch from branch-25.04 to branch-25.08 May 19, 2025 17:05

mythrocks removed their assignment Jul 31, 2025

bdice requested review from a team as code owners January 23, 2026 15:26

bdice requested review from KyleFromNVIDIA, mroeschke and wence- and removed request for a team January 23, 2026 15:26

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jan 23, 2026

github-project-automation bot added this to cuDF Python Jan 23, 2026

bdice changed the base branch from branch-25.08 to main January 23, 2026 17:01

GPUtester moved this to In Progress in cuDF Python Jan 23, 2026

Use stream pool for gather and scatter

b74d8cb

bdice force-pushed the stream-pool-gather branch from 2d7101b to b74d8cb Compare January 23, 2026 17:15

bdice removed request for a team, KyleFromNVIDIA, mroeschke and wence- January 23, 2026 17:42

bdice removed Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jan 23, 2026

bdice removed this from cuDF Python Jan 23, 2026


		auto it = thrust::make_counting_iterator<size_type>(0);

		std::transform(it, it + num_columns, result.begin(), [&](size_type i) {

Use stream pool for gather/scatter. #14162

Are you sure you want to change the base?

Use stream pool for gather/scatter. #14162

Uh oh!

Conversation

bdice commented Sep 21, 2023

Description

Checklist

Uh oh!

bdice commented Sep 22, 2023

Uh oh!

nvdbaranec commented Sep 29, 2023

Uh oh!

abellina Sep 29, 2023

Choose a reason for hiding this comment

Uh oh!

bdice Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot bot commented Mar 10, 2025

Uh oh!

Uh oh!

bdice commented Mar 10, 2025

Uh oh!

mythrocks commented Mar 10, 2025

Uh oh!

Uh oh!

vuule Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

ttnghia Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vuule Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

bdice Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

vuule Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

vuule Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mythrocks commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vyasr commented May 19, 2025

Uh oh!

bdice commented Jul 29, 2025

Uh oh!

mythrocks commented Jul 31, 2025

Uh oh!

copy-pr-bot bot commented Jan 23, 2026

Uh oh!

review-notebook-app bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bdice Sep 29, 2023 •

edited

Loading

ttnghia Mar 10, 2025 •

edited

Loading

mythrocks commented Mar 24, 2025 •

edited

Loading