update select_scatter #777

meinie0826 · 2025-07-12T10:14:19Z

PR Category

Operator

Type of Change

Performance Optimization

Description

merge kernel.

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

benchmark/test_select_and_slice_perf.py 
Operator: select_scatter  Performance Test (dtype=torch.float16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.008896            0.005856               1.519               0.950               1.443          [torch.Size([64, 64]), torch.Size([64]), 1, 17]
SUCCESS               0.009600            0.006048               1.587              13.760              21.841          [torch.Size([256, 256]), torch.Size([256]), 1, 20]
SUCCESS               0.012352            0.008160               1.514             170.114             257.506          [torch.Size([1024, 1024]), torch.Size([1024]), 1, 654]
SUCCESS               0.034752            0.029536               1.177             966.011            1136.607          [torch.Size([4096, 4096]), torch.Size([4096]), 1, 3715]
SUCCESS               0.099904            0.094848               1.053            1343.508            1415.126          [torch.Size([1024, 65536]), torch.Size([1024]), 1, 23580]
SUCCESS               0.015296            0.010528               1.453             337.343             490.122          [torch.Size([10000, 256]), torch.Size([10000]), 1, 224]
SUCCESS               0.870112            0.863968               1.007            1506.427            1517.139          [torch.Size([10000, 65536]), torch.Size([10000]), 1, 14848]


Operator: select_scatter  Performance Test (dtype=torch.float32, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.008832            0.006592               1.340               1.913               2.563          [torch.Size([64, 64]), torch.Size([64]), 1, 13]
SUCCESS               0.009952            0.006976               1.427              26.547              37.872          [torch.Size([256, 256]), torch.Size([256]), 1, 158]
SUCCESS               0.012896            0.009888               1.304             325.876             425.010          [torch.Size([1024, 1024]), torch.Size([1024]), 1, 389]
SUCCESS               0.055712            0.051872               1.074            1205.156            1294.371          [torch.Size([4096, 4096]), torch.Size([4096]), 1, 517]
SUCCESS               0.186912            0.184544               1.013            1436.203            1454.632          [torch.Size([1024, 65536]), torch.Size([1024]), 1, 51497]
SUCCESS               0.017408            0.014176               1.228             592.831             727.991          [torch.Size([10000, 256]), torch.Size([10000]), 1, 43]
SUCCESS               1.731968            1.730288               1.001            1513.608            1515.077          [torch.Size([10000, 65536]), torch.Size([10000]), 1, 58823]


Operator: select_scatter  Performance Test (dtype=torch.bfloat16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.008896            0.006144               1.448               0.950               1.375          [torch.Size([64, 64]), torch.Size([64]), 1, 26]
SUCCESS               0.009632            0.006336               1.520              13.714              20.848          [torch.Size([256, 256]), torch.Size([256]), 1, 23]
SUCCESS               0.012288            0.007808               1.574             171.000             269.115          [torch.Size([1024, 1024]), torch.Size([1024]), 1, 41]
SUCCESS               0.034752            0.029184               1.191             966.011            1150.316          [torch.Size([4096, 4096]), torch.Size([4096]), 1, 3040]
SUCCESS               0.100096            0.095104               1.052            1340.931            1411.316          [torch.Size([1024, 65536]), torch.Size([1024]), 1, 20529]
SUCCESS               0.014816            0.009856               1.503             348.272             523.539          [torch.Size([10000, 256]), torch.Size([10000]), 1, 42]
SUCCESS               0.870544            0.864128               1.007            1505.679            1516.859          [torch.Size([10000, 65536]), torch.Size([10000]), 1, 33231]

iclementine

LGTM

update select_scatter

df01fce

iclementine approved these changes Jul 14, 2025

View reviewed changes

iclementine merged commit ad109d6 into master Jul 14, 2025
10 of 14 checks passed

iclementine deleted the op/select_scatter branch July 14, 2025 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update select_scatter #777

update select_scatter #777

Uh oh!

meinie0826 commented Jul 12, 2025

Uh oh!

iclementine left a comment

Uh oh!

Uh oh!

Uh oh!

update select_scatter #777

update select_scatter #777

Uh oh!

Conversation

meinie0826 commented Jul 12, 2025

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

iclementine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!