Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(#18256)" #18331

nirvedhmeshram · 2024-08-22T23:33:45Z

This reverts commit 1c0c5a6.

This is causing an issue with https://github.com/iree-org/iree/actions/runs/10505447157/job/29119827242#step:5:2269
iree/tests/e2e/regression/check_regression_llvm-cpu_layernorm.mlir

Triaging the bug I see that one of the dispatches is slightly different but should result to the same numerics but it does not
Here is how the problem dispatch used to be originally

        func.func @old_dispatch2(%11 : tensor<128x384xf32>, %12: tensor<128xf32>) -> tensor<128x384xf32> {
        %cst = arith.constant 5.000000e+00 : f32
        %cst_0 = arith.constant 0.000000e+00 : f32
        %13 = tensor.empty() : tensor<128x384xf32>
        %14 = tensor.empty() : tensor<128xf32>
        %15 = linalg.fill ins(%cst_0 : f32) outs(%14 : tensor<128xf32>) -> tensor<128xf32>
        %16 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%11 : tensor<128x384xf32>) outs(%15 : tensor<128xf32>) {
        ^bb0(%in: f32, %out: f32):
          %18 = arith.addf %in, %out : f32
          linalg.yield %18 : f32
        } -> tensor<128xf32>
        %17 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%11, %16, %12 : tensor<128x384xf32>, tensor<128xf32>, tensor<128xf32>) outs(%13 : tensor<128x384xf32>) {
        ^bb0(%og_in : f32, %in: f32, %in_1: f32, %out: f32):
          %18 = arith.mulf %in, %in_1 : f32
          %19 = arith.subf %og_in, %18 : f32
          linalg.yield %19 : f32
        } -> tensor<128x384xf32>
        return %17 :  tensor<128x384xf32>
        }

with the reverting PR this dispatch is becoming

        func.func @new_dispatch2(%11 : tensor<128x384xf32>, %12: tensor<128xf32>) -> tensor<128x384xf32> {
      %cst = arith.constant 5.000000e+00 : f32
      %cst_0 = arith.constant 0.000000e+00 : f32
      %13 = tensor.empty() : tensor<128x384xf32>
      %14 = tensor.empty() : tensor<128xf32>
      %15 = linalg.fill ins(%cst_0 : f32) outs(%14 : tensor<128xf32>) -> tensor<128xf32>
      %16 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%11 : tensor<128x384xf32>) outs(%15 : tensor<128xf32>) {
      ^bb0(%in: f32, %out: f32):
        %18 = arith.addf %in, %out : f32
        linalg.yield %18 : f32
      } -> tensor<128xf32>
      %17 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%16, %12 : tensor<128xf32>, tensor<128xf32>) outs(%13 : tensor<128x384xf32>) {
      ^bb0(%in: f32, %in_1: f32, %out: f32):
        %18 = arith.mulf %in, %in_1 : f32
        %19 = arith.subf %cst, %18 : f32
        linalg.yield %19 : f32
      } -> tensor<128x384xf32>
      return %17 :  tensor<128x384xf32>
      }

The difference is at %19 = arith.subf %cst, %18 : f32
Note that in both cases %11 : tensor<128x384xf32> is 5.0 from the model input and hence the output should be same, however on arch64 it is not but on x86 it is, the IR at the mlir llvm dialect is identicalbetween x86 and arch64 so its not some easy to spot intrinsics / fast math kind of bug AFAICS

…18256)" This reverts commit 1c0c5a6.

MaheshRavishankar · 2024-08-22T23:55:07Z

I dont see how the PR that you have would have resulted in the wrong case. There is nothing in that PR that should result in use of %cst instead of the input right?

nirvedhmeshram · 2024-08-23T00:10:57Z

I dont see how the PR that you have would have resulted in the wrong case. There is nothing in that PR that should result in use of %cst instead of the input right?

I think the tensor.cast that otherwise end up in the dispatch regions probably prevent some analysis which now can happen with the flow.tensor.reshapes.

Also, am unsure that this test represents real world dynamic scenario. The analysis is able to infer the sizes from the flow.dynamic.constant input types and in the above case inline the whole input as a constant. I don't think this will happen for a real model.

MaheshRavishankar · 2024-08-23T00:18:20Z

I dont see how the PR that you have would have resulted in the wrong case. There is nothing in that PR that should result in use of %cst instead of the input right?

I think the tensor.cast that otherwise end up in the dispatch regions probably prevent some analysis which now can happen with the flow.tensor.reshapes.

Also, am unsure that this test represents real world dynamic scenario. The analysis is able to infer the sizes from the flow.dynamic.constant input types and in the above case inline the whole input as a constant. I don't think this will happen for a real model.

Ok, I need more context here. This might be an ARM64 bug and we might have to mark as xfail, narrow down an LLVM repro and reland this

#18256)" (#18331) This reverts commit 8da4564.

Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early (#…

1025efd

…18256)" This reverts commit 1c0c5a6.

nirvedhmeshram requested review from hanhanW, MaheshRavishankar and IanWood1 as code owners August 22, 2024 23:33

ScottTodd approved these changes Aug 22, 2024

View reviewed changes

nirvedhmeshram enabled auto-merge (squash) August 22, 2024 23:36

nirvedhmeshram merged commit 8da4564 into main Aug 22, 2024
41 checks passed

nirvedhmeshram deleted the users/nirvedhmeshram/arm_issue branch August 22, 2024 23:49

nirvedhmeshram added a commit that referenced this pull request Aug 26, 2024

Reapply "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(

d8f204d

#18256)" (#18331) This reverts commit 8da4564.

nirvedhmeshram added a commit that referenced this pull request Aug 26, 2024

Reapply "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(

3818827

#18256)" (#18331) This reverts commit 8da4564.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(#18256)" #18331

Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(#18256)" #18331

nirvedhmeshram commented Aug 22, 2024 •

edited

Loading

MaheshRavishankar commented Aug 22, 2024

nirvedhmeshram commented Aug 23, 2024

MaheshRavishankar commented Aug 23, 2024

Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(#18256)" #18331

Revert "[Flow] Convert from tensor.cast to flow.tensor.reshape early …(#18256)" #18331

Conversation

nirvedhmeshram commented Aug 22, 2024 • edited Loading

MaheshRavishankar commented Aug 22, 2024

nirvedhmeshram commented Aug 23, 2024

MaheshRavishankar commented Aug 23, 2024

nirvedhmeshram commented Aug 22, 2024 •

edited

Loading