Add contiguous op #511

0x45f · 2025-03-25T06:59:02Z

PR Category

Operator

Type of Change

New Feature

Description

Add contiguous op

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

A100

Operator: torch.Tensor.contiguous  Performance Test (dtype=torch.float16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               2.514880            2.362880               1.064          [torch.Size([536870912])]
SUCCESS               0.009504            0.006944               1.369          [torch.Size([32, 64])]
SUCCESS               0.045408            0.033408               1.359          [torch.Size([2048, 4096])]
SUCCESS               0.045632            0.033696               1.354          [torch.Size([32, 512, 512])]
SUCCESS               2.211552            1.610528               1.373          [torch.Size([512, 1024, 1024])]
SUCCESS               0.641120            0.598848               1.071          [torch.Size([134217728])]
SUCCESS               0.009056            0.007360               1.230          [torch.Size([5000, 1])]
SUCCESS               0.015712            0.012064               1.302          [torch.Size([5000, 256])]
SUCCESS               1.364064            0.977568               1.395          [torch.Size([5000, 65536])]
SUCCESS               0.009056            0.008224               1.101          [torch.Size([50, 1, 100])]
SUCCESS               0.015744            0.012736               1.236          [torch.Size([50, 256, 100])]
SUCCESS               1.352512            1.156416               1.170          [torch.Size([50, 65536, 100])]


Operator: torch.Tensor.contiguous  Performance Test (dtype=torch.float32, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               4.763840            4.698016               1.014          [torch.Size([536870912])]
SUCCESS               0.008000            0.007296               1.096          [torch.Size([32, 64])]
SUCCESS               0.060256            0.058080               1.037          [torch.Size([2048, 4096])]
SUCCESS               0.060672            0.057984               1.046          [torch.Size([32, 512, 512])]
SUCCESS               3.159072            3.136352               1.007          [torch.Size([512, 1024, 1024])]
SUCCESS               1.200832            1.185152               1.013          [torch.Size([134217728])]
SUCCESS               0.008512            0.007520               1.132          [torch.Size([5000, 1])]
SUCCESS               0.017024            0.015520               1.097          [torch.Size([5000, 256])]
SUCCESS               1.946752            1.915968               1.016          [torch.Size([5000, 65536])]
SUCCESS               0.008608            0.006976               1.234          [torch.Size([50, 1, 100])]
SUCCESS               0.016160            0.016672               0.969          [torch.Size([50, 256, 100])]
SUCCESS               1.933280            1.972352               0.980          [torch.Size([50, 65536, 100])]


Operator: torch.Tensor.contiguous  Performance Test (dtype=torch.bfloat16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               2.515648            2.363072               1.065          [torch.Size([536870912])]
SUCCESS               0.009024            0.006944               1.300          [torch.Size([32, 64])]
SUCCESS               0.045376            0.033376               1.360          [torch.Size([2048, 4096])]
SUCCESS               0.045088            0.033696               1.338          [torch.Size([32, 512, 512])]
SUCCESS               2.211360            1.611328               1.372          [torch.Size([512, 1024, 1024])]
SUCCESS               0.640544            0.598592               1.070          [torch.Size([134217728])]
SUCCESS               0.009600            0.006912               1.389          [torch.Size([5000, 1])]
SUCCESS               0.014976            0.011360               1.318          [torch.Size([5000, 256])]
SUCCESS               1.364736            0.977568               1.396          [torch.Size([5000, 65536])]
SUCCESS               0.009376            0.008416               1.114          [torch.Size([50, 1, 100])]
SUCCESS               0.015040            0.013568               1.108          [torch.Size([50, 256, 100])]
SUCCESS               1.353280            1.156544               1.170          [torch.Size([50, 65536, 100])]


Operator: torch.Tensor.contiguous  Performance Test (dtype=torch.int16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               2.515648            2.362720               1.065          [torch.Size([536870912])]
SUCCESS               0.009664            0.007296               1.325          [torch.Size([32, 64])]
SUCCESS               0.046400            0.033408               1.389          [torch.Size([2048, 4096])]
SUCCESS               0.045120            0.033664               1.340          [torch.Size([32, 512, 512])]
SUCCESS               2.211040            1.611072               1.372          [torch.Size([512, 1024, 1024])]
SUCCESS               0.640480            0.598592               1.070          [torch.Size([134217728])]
SUCCESS               0.009024            0.007968               1.133          [torch.Size([5000, 1])]
SUCCESS               0.015744            0.011360               1.386          [torch.Size([5000, 256])]
SUCCESS               1.364768            0.977600               1.396          [torch.Size([5000, 65536])]
SUCCESS               0.010272            0.007424               1.384          [torch.Size([50, 1, 100])]
SUCCESS               0.015040            0.013632               1.103          [torch.Size([50, 256, 100])]
SUCCESS               1.352832            1.156480               1.170          [torch.Size([50, 65536, 100])]


Operator: torch.Tensor.contiguous  Performance Test (dtype=torch.int32, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               4.764800            4.700896               1.014          [torch.Size([536870912])]
SUCCESS               0.008000            0.006944               1.152          [torch.Size([32, 64])]
SUCCESS               0.061312            0.058752               1.044          [torch.Size([2048, 4096])]
SUCCESS               0.060576            0.058528               1.035          [torch.Size([32, 512, 512])]
SUCCESS               3.158752            3.136512               1.007          [torch.Size([512, 1024, 1024])]
SUCCESS               1.201056            1.184416               1.014          [torch.Size([134217728])]
SUCCESS               0.008000            0.006944               1.152          [torch.Size([5000, 1])]
SUCCESS               0.017120            0.015584               1.099          [torch.Size([5000, 256])]
SUCCESS               1.946848            1.916192               1.016          [torch.Size([5000, 65536])]
SUCCESS               0.009280            0.008256               1.124          [torch.Size([50, 1, 100])]
SUCCESS               0.016672            0.015744               1.059          [torch.Size([50, 256, 100])]
SUCCESS               1.933088            1.972224               0.980          [torch.Size([50, 65536, 100])]

StrongSpoon

please update these statements and we could merge this pr.

benchmark/test_special_perf.py

StrongSpoon · 2025-04-02T02:14:56Z

tests/test_special_ops.py

+            low=-10000, high=10000, size=shape, dtype=dtype, device=flag_gems.device
+        )
+    inp = inp[::2]
+    np.testing.assert_equal(inp.is_contiguous(), False)


assert inp.is_contiguous() == False is more simple.

StrongSpoon · 2025-04-02T02:15:54Z

tests/test_special_ops.py

+
+    np.testing.assert_equal(ref_out.is_contiguous(), True)
+    np.testing.assert_equal(res_out.is_contiguous(), True)
+    np.testing.assert_equal(res_out.stride(), ref_out.stride())


StrongSpoon

lg

0x45f added 4 commits March 25, 2025 14:43

Add contiguous op

d271ccb

Fix

7d16a4b

Fix test

c29d1d6

Fix ut

c7c6f29

StrongSpoon reviewed Apr 2, 2025

View reviewed changes

0x45f added 2 commits April 2, 2025 10:27

Merge branch 'master' of github.com:FlagOpen/FlagGems into contiguous-op

050fd59

Fix ut

25287a5

StrongSpoon approved these changes Apr 7, 2025

View reviewed changes

StrongSpoon merged commit 4a18951 into FlagOpen:master Apr 7, 2025
12 of 13 checks passed

0x45f deleted the contiguous-op branch July 7, 2025 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add contiguous op #511

Add contiguous op #511

Uh oh!

0x45f commented Mar 25, 2025 •

edited

Loading

Uh oh!

StrongSpoon left a comment

Uh oh!

Uh oh!

StrongSpoon Apr 2, 2025

Uh oh!

0x45f Apr 2, 2025

Uh oh!

StrongSpoon Apr 2, 2025

Uh oh!

0x45f Apr 2, 2025

Uh oh!

StrongSpoon left a comment

Uh oh!

Uh oh!

Uh oh!

Add contiguous op #511

Add contiguous op #511

Uh oh!

Conversation

0x45f commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

StrongSpoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StrongSpoon Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

StrongSpoon Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

StrongSpoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

0x45f commented Mar 25, 2025 •

edited

Loading