Skip to content

[mlir] Inconsistent output when executing MLIR program with affine-parallelize and --affine-super-vectorize #119999

@anonymoususer-1

Description

@anonymoususer-1

git version: ff939b0

system: Ubuntu 18.04.6 LTS

Description:

I am experiencing an inconsistent result when executing the same MLIR program with and without affine-parallelize and --affine-super-vectorize.
The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.

Steps to Reproduce:

1. MLIR Program (tosa.mlir):

tosa.mlir:

module {
  func.func private @printMemrefI32(tensor<*xi32>)
  func.func private @printMemrefF32(tensor<*xf32>)
  func.func @main() {
    %0 = "tosa.const"() <{value = dense<[0, 2, 1]> : tensor<3xi32>}> : () -> tensor<3xi32>
    %1 = "tosa.const"() <{value = dense<-12> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
    %2 = "tosa.const"() <{value = dense<1676> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
    %3 = "tosa.const"() <{value = dense<-10> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
    %4 = tosa.abs %2 : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
    %5 = tosa.clamp %4 {max_fp = 1.600000e+01 : f32, max_int = 16 : i64, min_fp = 0.000000e+00 : f32, min_int = 0 : i64} : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
    %6 = tosa.arithmetic_right_shift %2, %5 {round = true} : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
    %7 = tosa.minimum %6, %1 : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
    %8 = tosa.transpose %3, %0 : (tensor<1x4x21xi32>, tensor<3xi32>) -> tensor<1x21x4xi32>
    %9 = tosa.matmul %7, %8 : (tensor<1x4x21xi32>, tensor<1x21x4xi32>) -> tensor<1x4x4xi32>
    %cast = tensor.cast %9 : tensor<1x4x4xi32> to tensor<*xi32>
    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
    return
  }
}

2. Command to Run without affine-parallelize and --affine-super-vectorize :

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir \
-pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt\
--linalg-generalize-named-ops  -tosa-to-arith  -convert-math-to-llvm    --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" \
-one-shot-bufferize="bufferize-function-boundaries"    -convert-arith-to-llvm     -convert-linalg-to-affine-loops  \
-convert-vector-to-scf    -convert-arith-to-llvm    --affine-loop-coalescing  -convert-vector-to-scf    -convert-vector-to-llvm  \
-convert-math-to-llvm -convert-arith-to-llvm   -lower-affine     -convert-scf-to-cf   -finalize-memref-to-llvm \
-convert-func-to-llvm  -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so\
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

3. Output without affine-parallelize and --affine-super-vectorize ::

[[[2520,    2520,    2520,    2520],
  [2520,    2520,    2520,    2520],
  [2520,    2520,    2520,    2520],
  [2520,    2520,    2520,    2520]]]

4. Command to Run with affine-parallelize and --affine-super-vectorize :

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir  -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))"\
| /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt     --linalg-generalize-named-ops   -tosa-to-arith  -convert-math-to-llvm \
--test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control"   -one-shot-bufferize="bufferize-function-boundaries"    -convert-arith-to-llvm \
-convert-linalg-to-affine-loops  --affine-parallelize    -convert-vector-to-scf    -convert-arith-to-llvm    --affine-loop-coalescing  -convert-vector-to-scf \ 
--affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0 vectorize-reductions=true"      -convert-vector-to-llvm     -convert-math-to-llvm \
-convert-arith-to-llvm       -lower-affine     -convert-scf-to-cf   -finalize-memref-to-llvm  -convert-func-to-llvm  -reconcile-unrealized-casts \
| timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

5. Output with affine-parallelize and --affine-super-vectorize :

[[[120,    120,    120,    120],
  [120,    120,    120,    120],
  [120,    120,    120,    120],
  [120,    120,    120,    120]]]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions