add `std_mean` op #1971

khushi-411 · 2025-04-16T17:33:20Z

As per the title.

mruberry · 2025-04-16T18:20:27Z

@beverlylytle, would you like to review this PR?

beverlylytle

This looks good, thank you! I was looking through PyTorch's issues for std_mean and I leave two thoughts for your consideration:

Did you happen to check float('inf') values? torch.std_mean returns NaN as mean of an inf array. pytorch/pytorch#138570
I don't think there's something to be done here, but I thought it was interesting: torch.std_mean slower than separate torch.mean and torch.std calls on CPU pytorch/pytorch#122191

beverlylytle · 2025-04-22T09:18:32Z

thunder/tests/opinfos.py

+    sample_input_generator=std_sample_generator,
+    error_input_generator=std_error_generator,
+    torch_reference=torch.std_mean,
+    dtypes=(datatypes.floating,),


There are checks in the meta for complex types, but they are omitted from testing. I know there are other issues with testing complex types, but were they left out here for a reason?

kshitij12345

Does this need to be a prim? I think we can just have a decomposition in torch/__init__.py which calls ltorch.mean and ltorch.std. Fusion executor like nvFuser would generate a good kernel. This way, we won't need to add a prim and grad rule for the same.

Wdyt @khushi-411 @beverlylytle?

beverlylytle · 2025-04-23T12:03:27Z

@kshitij12345 you make a good point, but var_mean is a primitive with a distinct nvfuser op for a reason, right? In computing the variance, the mean is computed along the way. std is the square root of variance, so computing std and mean separately would calculate the mean twice. Nvfuser doesn't have a separate op for std_mean. What about a decomposition of std_mean into var_mean followed by sqrt?

kshitij12345 · 2025-04-23T12:22:10Z

Good point. That makes sense to me, thanks!

mruberry · 2025-04-23T13:19:47Z

@kshitij12345 you make a good point, but var_mean is a primitive with a distinct nvfuser op for a reason, right? In computing the variance, the mean is computed along the way. std is the square root of variance, so computing std and mean separately would calculate the mean twice. Nvfuser doesn't have a separate op for std_mean. What about a decomposition of std_mean into var_mean followed by sqrt?

var_mean is a primitive because there are several ways to compute the variance, and one popular way is Welford's algorithm (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm) which has one and two pass variants. We could require executors use Welford's algorithm by making a primitive for it, and we could even explicitly require the one or two pass version if we wanted, but currently we let executors figure out how they should compute the mean and variance.

For std_mean, is it OK to take the sqrt after the variance is computed, or is there a similar issue as with the Welford computation? I think it is OK to take it after, and I believe that's what the PyTorch code does:

https://github.com/pytorch/pytorch/blob/b32b002a6ea879e506453b09a4b206632e530abf/aten/src/ATen/native/cuda/ReduceMomentKernel.cu#L14

https://github.com/pytorch/pytorch/blob/b32b002a6ea879e506453b09a4b206632e530abf/aten/src/ATen/native/SharedReduceOps.h#L97

But I could be mistaken.

beverlylytle · 2025-04-24T06:56:07Z

We could require executors use Welford's algorithm by making a primitive for it, and we could even explicitly require the one or two pass version if we wanted

I am inclined against being so prescriptive without a hard reason.

While it's possible that executors may want to provide their own implementations of std_mean in the future, there are none (besides the default) doing so now. Thunder can provide an immediate efficiency improvement in the case of NvFuser execution for std_mean with a var_mean-sqrt implementation (which I do believe is OK), rather than being a primitive. What do you think @khushi-411 ?

mruberry · 2025-04-24T14:23:16Z

We could require executors use Welford's algorithm by making a primitive for it, and we could even explicitly require the one or two pass version if we wanted

I am inclined against being so prescriptive without a hard reason.

While it's possible that executors may want to provide their own implementations of std_mean in the future, there are none (besides the default) doing so now. Thunder can provide an immediate efficiency improvement in the case of NvFuser execution for std_mean with a var_mean-sqrt implementation (which I do believe is OK), rather than being a primitive. What do you think @khushi-411 ?

Sounds good; executors can also consume the torch operation directly if they have custom std+mean logic.

add std_mean op

456ec5e

khushi-411 marked this pull request as ready for review April 16, 2025 17:55

khushi-411 requested review from mruberry, lantiga and t-vi as code owners April 16, 2025 17:55

mruberry requested a review from beverlylytle April 16, 2025 18:17

beverlylytle approved these changes Apr 22, 2025

View reviewed changes

kshitij12345 reviewed Apr 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `std_mean` op #1971

add `std_mean` op #1971

khushi-411 commented Apr 16, 2025

mruberry commented Apr 16, 2025

beverlylytle left a comment

beverlylytle Apr 22, 2025

kshitij12345 left a comment

beverlylytle commented Apr 23, 2025 •

edited

Loading

kshitij12345 commented Apr 23, 2025

mruberry commented Apr 23, 2025

beverlylytle commented Apr 24, 2025

mruberry commented Apr 24, 2025

add std_mean op #1971

Are you sure you want to change the base?

add std_mean op #1971

Conversation

khushi-411 commented Apr 16, 2025

mruberry commented Apr 16, 2025

beverlylytle left a comment

Choose a reason for hiding this comment

beverlylytle Apr 22, 2025

Choose a reason for hiding this comment

kshitij12345 left a comment

Choose a reason for hiding this comment

beverlylytle commented Apr 23, 2025 • edited Loading

kshitij12345 commented Apr 23, 2025

mruberry commented Apr 23, 2025

beverlylytle commented Apr 24, 2025

mruberry commented Apr 24, 2025

add `std_mean` op #1971

add `std_mean` op #1971

beverlylytle commented Apr 23, 2025 •

edited

Loading