Skip to content

update slice_scatter #776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2025
Merged

update slice_scatter #776

merged 2 commits into from
Jul 14, 2025

Conversation

meinie0826
Copy link
Collaborator

@meinie0826 meinie0826 commented Jul 12, 2025

PR Category

Operator

Type of Change

Performance Optimization

Description

Merge two kernels into one, and refactoring the kernel.

Issue

Progress

  • Change is properly reviewed (1 reviewer required, 2 recommended).
  • Change is responded to an issue.
  • Change is fully covered by a UT.

Performance

benchmark/test_select_and_slice_perf.py 
Operator: slice_scatter  Performance Test (dtype=torch.float16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.010400            0.006240               1.667               1.575               2.626          [torch.Size([64, 64]), torch.Size([64, 32]), 1, 0, 64, 2]
SUCCESS               0.010976            0.006560               1.673              23.883              39.961          [torch.Size([256, 256]), torch.Size([256, 128]), 1, 0, 256, 2]
SUCCESS               0.013248            0.008576               1.545             316.599             489.075          [torch.Size([1024, 1024]), torch.Size([1024, 512]), 1, 0, 1024, 2]
SUCCESS               0.064384            0.036160               1.781            1042.322            1855.887          [torch.Size([4096, 4096]), torch.Size([4096, 2048]), 1, 0, 4096, 2]
SUCCESS               0.218624            0.120384               1.816            1227.841            2229.827          [torch.Size([1024, 65536]), torch.Size([1024, 32768]), 1, 0, 65536, 2]
SUCCESS               0.017280            0.011744               1.471             592.593             871.935          [torch.Size([10000, 256]), torch.Size([10000, 128]), 1, 0, 256, 2]
SUCCESS               2.002080            1.178784               1.698            1309.358            2223.851          [torch.Size([10000, 65536]), torch.Size([10000, 32768]), 1, 0, 65536, 2]


Operator: slice_scatter  Performance Test (dtype=torch.float32, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.009920            0.006624               1.498               3.303               4.947          [torch.Size([64, 64]), torch.Size([64, 32]), 1, 0, 64, 2]
SUCCESS               0.010272            0.007232               1.420              51.040              72.496          [torch.Size([256, 256]), torch.Size([256, 128]), 1, 0, 256, 2]
SUCCESS               0.014496            0.011072               1.309             578.684             757.642          [torch.Size([1024, 1024]), torch.Size([1024, 512]), 1, 0, 1024, 2]
SUCCESS               0.111392            0.063648               1.750            1204.914            2108.750          [torch.Size([4096, 4096]), torch.Size([4096, 2048]), 1, 0, 4096, 2]
SUCCESS               0.405920            0.226240               1.794            1322.603            2373.015          [torch.Size([1024, 65536]), torch.Size([1024, 32768]), 1, 0, 65536, 2]
SUCCESS               0.021552            0.016288               1.323             950.260            1257.367          [torch.Size([10000, 256]), torch.Size([10000, 128]), 1, 0, 256, 2]
SUCCESS               3.858240            2.136048               1.806            1358.879            2454.477          [torch.Size([10000, 65536]), torch.Size([10000, 32768]), 1, 0, 65536, 2]


Operator: slice_scatter  Performance Test (dtype=torch.bfloat16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Torch GBPS            Gems GBPS           Size Detail
-----------------------------------------------------------------------------------------------------------------------------------------
SUCCESS               0.010432            0.006048               1.725               1.571               2.709          [torch.Size([64, 64]), torch.Size([64, 32]), 1, 0, 64, 2]
SUCCESS               0.011072            0.006624               1.671              23.676              39.575          [torch.Size([256, 256]), torch.Size([256, 128]), 1, 0, 256, 2]
SUCCESS               0.013248            0.008896               1.489             316.599             471.482          [torch.Size([1024, 1024]), torch.Size([1024, 512]), 1, 0, 1024, 2]
SUCCESS               0.064304            0.036096               1.781            1043.619            1859.177          [torch.Size([4096, 4096]), torch.Size([4096, 2048]), 1, 0, 4096, 2]
SUCCESS               0.218736            0.120256               1.819            1227.212            2232.200          [torch.Size([1024, 65536]), torch.Size([1024, 32768]), 1, 0, 65536, 2]
SUCCESS               0.017184            0.011520               1.492             595.903             888.889          [torch.Size([10000, 256]), torch.Size([10000, 128]), 1, 0, 256, 2]
SUCCESS               2.000512            1.156384               1.730            1310.385            2266.929          [torch.Size([10000, 65536]), torch.Size([10000, 32768]), 1, 0, 65536, 2]

Copy link
Collaborator

@iclementine iclementine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iclementine iclementine merged commit 058f781 into master Jul 14, 2025
10 of 14 checks passed
@iclementine iclementine deleted the op/slice_scatter branch July 14, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants