Backward register #423

StrongSpoon · 2025-01-16T09:22:46Z

PR Category

Operator

Type of Change

New Feature

Description

register backward functions as aten interfaces
implement threshold operator incidentally

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

src/flag_gems/ops/batch_norm.py

tongxin · 2025-02-19T07:33:07Z

src/flag_gems/ops/batch_norm.py

-    affine: tl.constexpr,
+    input_grad_mask: tl.constexpr,
+    weight_grad_mask: tl.constexpr,
+    bias_grad_mask: tl.constexpr,


The backward kernel may need is_train arg also, to distinguish between train and non-train cases.

We can leave it for future work tho.

it's a bit complex. fix it later QAQ

tongxin · 2025-02-19T07:35:16Z

src/flag_gems/ops/batch_norm.py

+    running_var=None,
+    save_mean=None,
+    save_invstd=None,
+    train=False,


kernel should be able to handle train=True case.

tongxin · 2025-02-19T07:50:47Z

src/flag_gems/ops/dropout.py

-
-def native_dropout(x, p=0.5, train=True):
-    return NativeDropout.apply(x, p, train)
+def dropout(input, p, train):


Arg train is optional.

tongxin · 2025-02-19T07:55:35Z

src/flag_gems/ops/dropout.py

+    logging.debug("GEMS NATIVE DROPOUT FORWARD")
+    assert p > 0.0 and p < 1.0, "p must be in (0, 1)"
+    device = input.device
+    input = input.contiguous()


Add a note that we'll remove contiguous enforcement in the future.

tongxin · 2025-02-19T07:57:34Z

src/flag_gems/ops/embedding.py

+    indices = indices.contiguous()
+    weight = weight.contiguous()


Refactor this in TODOs.

tongxin · 2025-02-19T08:06:35Z

src/flag_gems/ops/groupnorm.py

+    mean = mean.contiguous()
+    rstd = rstd.contiguous()
+    weight = None if weight is None else weight.contiguous()
+    group_size = C // group


tongxin · 2025-02-19T08:09:10Z

src/flag_gems/ops/groupnorm.py

-                BLOCK_GROUP_SIZE=triton.next_power_of_2(C // num_groups),
-                BLOCK_HW_SIZE=triton.next_power_of_2(HW),
+                HxW,
+                BLOCK_GROUP_SIZE=triton.next_power_of_2(C // group),


cdiv(C, group)?

tongxin · 2025-02-19T08:52:41Z

src/flag_gems/ops/dropout.py

-
-def native_dropout(x, p=0.5, train=True):
-    return NativeDropout.apply(x, p, train)
+def dropout(input, p, train):


I realized we didn't handle we train=False correctly in the previous version. Let's fix that.

iclementine · 2025-05-14T08:03:25Z

src/flag_gems/fused/cross_entropy_loss.py

+    "mean": 1,
+    "sum": 2,
+}
+


I suggest using torch's torch/nn/_reduction.py like this

https://github.com/pytorch/pytorch/blob/4015166e5d51bc39d5a81aa59ad49720ec2a23fe/torch/nn/functional.py#L3089C1-L3097C6

iclementine · 2025-05-14T09:01:40Z

tests/test_unary_pointwise_ops.py

    with flag_gems.use_gems():
-        res_out = torch.sigmoid_(inp * 1.0)
+        res_out = torch.sigmoid_(res_inp)


why there was inp*1.0?

I don't know either.

iclementine

LGTM

…e time

… directory, which are registered as AutogradCUDA before

…um to convert reduction string to integer

StrongSpoon force-pushed the bwd branch 2 times, most recently from 9f79739 to 01bee17 Compare February 6, 2025 09:26

StrongSpoon marked this pull request as ready for review February 11, 2025 02:04

StrongSpoon force-pushed the bwd branch from cdcef25 to d9aeb36 Compare February 19, 2025 03:13

tongxin reviewed Feb 19, 2025

View reviewed changes

StrongSpoon force-pushed the bwd branch 2 times, most recently from cdcef25 to 0eb24e2 Compare March 17, 2025 06:03

StrongSpoon force-pushed the bwd branch 2 times, most recently from bd86725 to 1cc1ab5 Compare March 27, 2025 06:38

StrongSpoon force-pushed the bwd branch from 1cc1ab5 to 5f783f2 Compare April 14, 2025 02:38

StrongSpoon assigned kiddyjinjin Apr 16, 2025

StrongSpoon force-pushed the bwd branch from b8349a7 to c4803d6 Compare April 21, 2025 06:59

iclementine reviewed May 14, 2025

View reviewed changes

iclementine previously approved these changes May 14, 2025

View reviewed changes

StrongSpoon added 14 commits May 15, 2025 15:37

[Operator] register backward independently for tanh

0dd217d

[Operator] register backward independently for gelu

c572400

[Operator] implement threshold fwd and bwd, as bwd of relu at the sam…

d5ca921

…e time

[Operator] register sigmoid independently

e6fc912

[Operator] register silu backward independently

4856635

[Operator] register dropout backward independently

43f7207

[Operator] register embedding backward

e109052

[Operator] register group_norm backward

1ef767f

[Operator] register layer_norm backward

7de872b

[Test] test backward with torch.ops.aten functions

e406967

[Operator] optimize group_norm_backward to allow larger input

65d2bc6

[Bugfix] wrong call of threshold_backward

14cd996

[Operator] register backward of softmax

e758b99

[Operator] register log_softmax backward

8c59f82

StrongSpoon and others added 12 commits May 15, 2025 15:49

[Operator] register batch_norm backward

68c9341

[Operator] register weightnorm_interface_backward

e390365

[Operator] modify weight_norm

e81b8fb

[Bugfix] weight_norm test error

8f7f88b

[Bugfix] diagonal_backward

660fba9

[Bugfix] initialize cuda context properly and reduce test cases

a447483

remove backward for inplace ops

5532380

impl dropout on train=False and fix error in groupnorm

33aa5fd

[Operator] move ops weight_norm/instance_norm/outer/celoss into fused…

0159377

… directory, which are registered as AutogradCUDA before

reformat

a7da54c

rename some variables for better understanding; use torch.nn's get_en…

91264c7

…um to convert reduction string to integer

delete useless definition of REDUCTION

e3ef942

StrongSpoon dismissed iclementine’s stale review via e3ef942 May 15, 2025 07:55

StrongSpoon force-pushed the bwd branch from eaa271e to e3ef942 Compare May 15, 2025 07:55

misspell fix

c1fcfdd

iclementine previously approved these changes May 15, 2025

View reviewed changes

FlagOpen locked and limited conversation to collaborators May 16, 2025

FlagOpen unlocked this conversation May 16, 2025

Bowen12992 marked this pull request as draft May 16, 2025 06:41

Bowen12992 marked this pull request as ready for review May 16, 2025 06:41

Update weight_norm.py for ci

85d03ad

Bowen12992 dismissed iclementine’s stale review via 85d03ad May 16, 2025 06:47

Bowen12992 added 2 commits May 16, 2025 14:50

Update weight_norm.py

d745d7b

fix redefination of test_accuracy_polar

943a830

iclementine approved these changes May 16, 2025

View reviewed changes

iclementine merged commit 337267c into master May 19, 2025
11 of 15 checks passed

iclementine deleted the bwd branch May 19, 2025 03:27

Backward register #423

Backward register #423

Uh oh!

Conversation

StrongSpoon commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iclementine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

StrongSpoon commented Jan 16, 2025 •

edited

Loading