Add linspace op #478

0x45f · 2025-03-05T06:38:06Z

PR Category

Operator

Type of Change

New Feature

Description

Add linspace op

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

benchmark/test_tensor_constructor_perf.py 
Operator: linspace  Performance Test (dtype=torch.float16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.007744            0.007424               1.043          {'start': 0, 'end': 65503, 'steps': 11930, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007712            0.006176               1.249          {'start': 0, 'end': 4096, 'steps': 3598, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.006752            0.006528               1.034          {'start': 0, 'end': 65503, 'steps': 3165, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.006688            0.006976               0.959          {'start': 0, 'end': 65503, 'steps': 24547, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.006784            0.006656               1.019          {'start': 0, 'end': 65503, 'steps': 7776, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007392            0.007648               0.967          {'start': 0, 'end': 65503, 'steps': 48606, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007392            0.006176               1.197          {'start': 0, 'end': 10000, 'steps': 541, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007168            0.007008               1.023          {'start': 0, 'end': 65503, 'steps': 62128, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007264            0.007616               0.954          {'start': 0, 'end': 65503, 'steps': 30535, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.006432            0.006176               1.041          {'start': 0, 'end': 10000, 'steps': 7891, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.007360            0.007616               0.966          {'start': 0, 'end': 65503, 'steps': 39451, 'dtype': torch.float16, 'device': 'cuda'}
SUCCESS               0.006912            0.006656               1.038          {'start': 0, 'end': 65503, 'steps': 54982, 'dtype': torch.float16, 'device': 'cuda'}


Operator: linspace  Performance Test (dtype=torch.float32, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS              11.912832            5.959104               1.999          {'start': 0, 'end': 1073741824, 'steps': 1028344234, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.006176            0.006176               1.000          {'start': 0, 'end': 4096, 'steps': 1975, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.018048            0.012000               1.504          {'start': 0, 'end': 16777216, 'steps': 981161, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.038336            0.022112               1.734          {'start': 0, 'end': 16777216, 'steps': 2767021, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               6.853696            3.430368               1.998          {'start': 0, 'end': 1073741824, 'steps': 591429131, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               2.992288            1.499456               1.996          {'start': 0, 'end': 268435456, 'steps': 257972581, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.006752            0.006528               1.034          {'start': 0, 'end': 10000, 'steps': 561, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.015200            0.011104               1.369          {'start': 0, 'end': 2560000, 'steps': 766165, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               2.924992            1.465472               1.996          {'start': 0, 'end': 655360000, 'steps': 252165756, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.006784            0.007392               0.918          {'start': 0, 'end': 10000, 'steps': 6251, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.011712            0.009184               1.275          {'start': 0, 'end': 2560000, 'steps': 440106, 'dtype': torch.float32, 'device': 'cuda'}
SUCCESS               0.064672            0.035264               1.834          {'start': 0, 'end': 655360000, 'steps': 5076269, 'dtype': torch.float32, 'device': 'cuda'}


Operator: linspace  Performance Test (dtype=torch.bfloat16, mode=cuda,level=comprehensive)
Status       Torch Latency (ms)    Gems Latency (ms)         Gems Speedup          Size Detail
-----------------------------------------------------------------------------------------------
SUCCESS               0.653472            0.329152               1.985          {'start': 0, 'end': 1073741824, 'steps': 55846449, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.006720            0.006176               1.088          {'start': 0, 'end': 4096, 'steps': 3087, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.153792            0.079296               1.939          {'start': 0, 'end': 16777216, 'steps': 12677224, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.124928            0.065760               1.900          {'start': 0, 'end': 16777216, 'steps': 10274023, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               5.398432            2.702464               1.998          {'start': 0, 'end': 1073741824, 'steps': 465792052, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               2.240704            1.123712               1.994          {'start': 0, 'end': 268435456, 'steps': 193058198, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.006432            0.006784               0.948          {'start': 0, 'end': 10000, 'steps': 9974, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.019904            0.013536               1.470          {'start': 0, 'end': 2560000, 'steps': 1178752, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               5.961792            2.984256               1.998          {'start': 0, 'end': 655360000, 'steps': 514488235, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.006432            0.006752               0.953          {'start': 0, 'end': 10000, 'steps': 5423, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               0.029824            0.017984               1.658          {'start': 0, 'end': 2560000, 'steps': 2017588, 'dtype': torch.bfloat16, 'device': 'cuda'}
SUCCESS               4.037504            2.022432               1.996          {'start': 0, 'end': 655360000, 'steps': 348277461, 'dtype': torch.bfloat16, 'device': 'cuda'}

StrongSpoon

done

StrongSpoon · 2025-03-28T06:42:01Z

src/flag_gems/ops/linspace.py

+    dtype=None,
+    layout=torch.strided,
+    device=None,
+    requires_grad=False,


the definition of linspace in aten is different from torch interface, and it doesn't include out and requires_grad. suggest ensuring what parameters it may receive.

I refer to the function definition in /root/miniconda3/envs/gems-ops/lib/python3.10/site-packages/torch/_C/_VariableFunctions.pyi, which is different from the function definition in aten/src/ATen/native/native_functions.yaml. Should we keep it consistent with the one in native_functions.yaml?

StrongSpoon · 2025-03-28T06:47:46Z

src/flag_gems/ops/linspace.py

+        return torch.fill(out, start)
+    else:
+        if isinstance(start, torch.Tensor):
+            start = start.item()


picking the item from tensor and then passing it to kernel function might cause unnecessary costing of time.

Do we need to write 4 kernels for the case where start and end are tensors?

src/flag_gems/ops/linspace.py

StrongSpoon · 2025-03-28T06:56:06Z

tests/test_special_ops.py

@@ -533,6 +533,45 @@ def test_arange(start, step, end, dtype, device, pin_memory):
    gems_assert_equal(res_out, ref_out)


+@pytest.mark.linspace
+@pytest.mark.parametrize("start", [0, 2, 4])
+@pytest.mark.parametrize("end", [1024, 2048, 4096])


I suggest testing on cases that (end - start) < steps.

tests/test_special_ops.py

StrongSpoon

lg

0x45f added 4 commits March 5, 2025 14:37

Add linspace op

12d5dfa

Fix mid

26451a1

Fix steps == 1

df506ee

Fix start/end tensor

51b34c6

StrongSpoon reviewed Mar 31, 2025

View reviewed changes

0x45f added 2 commits March 31, 2025 11:16

Add ut

5639962

Fix linspace params

a28094f

StrongSpoon approved these changes Apr 2, 2025

View reviewed changes

0x45f merged commit 659ab85 into FlagOpen:master Apr 2, 2025
12 of 14 checks passed

0x45f deleted the linspace-op branch April 2, 2025 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add linspace op #478

Add linspace op #478

Uh oh!

0x45f commented Mar 5, 2025 •

edited

Loading

Uh oh!

StrongSpoon left a comment

Uh oh!

StrongSpoon Mar 28, 2025

Uh oh!

0x45f Mar 31, 2025

Uh oh!

0x45f Mar 31, 2025

Uh oh!

StrongSpoon Mar 28, 2025

Uh oh!

0x45f Mar 31, 2025

Uh oh!

Uh oh!

StrongSpoon Mar 28, 2025

Uh oh!

0x45f Mar 31, 2025

Uh oh!

Uh oh!

StrongSpoon left a comment

Uh oh!

Uh oh!

Uh oh!

Add linspace op #478

Add linspace op #478

Uh oh!

Conversation

0x45f commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

StrongSpoon left a comment

Choose a reason for hiding this comment

Uh oh!

StrongSpoon Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

StrongSpoon Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StrongSpoon Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

0x45f Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StrongSpoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

0x45f commented Mar 5, 2025 •

edited

Loading