Add `expand_shape` to encoding #18135

lialan · 2024-08-07T03:47:00Z

No description provided.

Max191

Nice work! Are you able to add a unit test for the pass?

Max191 · 2024-08-07T13:26:44Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp

+    // create linalg.transpose
+    // LHS/RHS:
+    // OuterTileX x InnerTileX x OuterTileY x InnerTileY
+    // -> OuterTileY x OuterTileX x InnerTileY x InnerTileX
+    // (perm = [2, 0, 3, 1])
+    //
+    // ACC:
+    // OuterTileX x InnerTileX x OuterTileY x InnerTileY
+    // -> OuterTileX x OuterTileY x InnerTileX x InnerTileY
+    //(perm = [0, 2, 1, 3])
+    ArrayRef<int64_t> permutation;
+    switch (roleIdx) {
+    case 0: // A
+    case 1: // B
+      permutation = {2, 0, 3, 1};
+      break;
+    case 2: // C
+      permutation = {0, 2, 1, 3};
+      break;
+    }


This transpose can be different depending on the specific kernel that is being targeted. Can you move the permutation selection to a separate function similar to getIntrinsicVectorSize?

done. So far it does not take extra arguments to specify which kernel, but we can add it later.

lialan · 2024-08-07T13:38:23Z

Nice work! Are you able to add a unit test for the pass?

so far not all the components are completed but we can have some lit tests for this particular part. Will add some.

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp

hanhanW · 2024-08-07T18:40:16Z

Alan, I think you can test with below MLIR snippet.

#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
  #hal.descriptor_set.layout<0, bindings = [
    #hal.descriptor_set.binding<0, storage_buffer>,
    #hal.descriptor_set.binding<1, storage_buffer>
  ]>
]>
func.func @set_encoding() {
  %c0 = arith.constant 0 : index
  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<255x513xf32>>
  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
  %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<255x513xf32>> -> tensor<255x513xf32>
  %3 = iree_encoding.set_encoding %2 : tensor<255x513xf32> -> tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>
  flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>> -> !flow.dispatch.tensor<writeonly:tensor<255x513xf32,  #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
  return
}

compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp

compiler/src/iree/compiler/Codegen/LLVMCPU/test/data_tile.mlir

hanhanW

Thanks, I'll review details tomorrow. Just point out one issue in the test.

compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp

hanhanW · 2024-08-13T22:13:32Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir

I did not know the outlining trick when I left the comment. Can you update the lit tests to outline encodings? It makes the tests much easier to read and update. E.g., 50f18f1

hanhanW · 2024-08-15T17:04:58Z

Hey Alan, I updated the shared/gpu-data-tiling-materialize-encoding branch few days ago. I'm seeing some unrelated changes in the PR. Can you rebase your PR on top of that? I think the main reason is that you have my additional commits in your branch. You should be able to rebase on the remote branch and drop your local three commits from me.

(Perhaps I should not use force-push to remote branch next time. The commits hash is changed because I need to resolve some conflicts.)

Signed-off-by: Alan Li <[email protected]>

hanhanW

It looks okay, just few nits.

hanhanW · 2024-08-15T21:09:23Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp

+        loc, expandShapeType, packOp->getResult(), *reassociationMap);
+
+    // create linalg.transpose on expandShapeShape
+    size_t origRank = origRank = encodingOp.getSourceType().getRank();


Suggested change

size_t origRank = origRank = encodingOp.getSourceType().getRank();

size_t origRank = encodingOp.getSourceType().getRank();

hanhanW · 2024-08-15T21:12:29Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir

+// CHECK: %[[PACK:.*]] = tensor.pack %2 padding_value(%cst : f32) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 4] into %[[EMPTY]] : tensor<255x513xf32> -> tensor<33x64x16x4xf32>
+// CHECK: %[[EXPAND_LHS:.*]] = tensor.expand_shape %[[PACK]]
+// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32>
+// CHECK: %[[TRANSPOSE:.*]] = linalg.transpose ins(%[[EXPAND_LHS]]


It's better to check the transpose permutation as well. It gives us an idea about how the layout looks like after tile swizzling in the lit test.

hanhanW · 2024-08-15T21:12:59Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir

+// CHECK: %[[EXPAND_RHS:.*]] = tensor.expand_shape %[[PACK_RHS]]
+// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32>
+// CHECK: %[[EMPTY_RHS2:.*]] = tensor.empty() : tensor<33x64x4x16x1x1xf32>
+// CHECK: %[[TRANSPOSE_RHS:.*]] = linalg.transpose ins(%[[EXPAND_RHS]]


nit: check the permutation and shape.

Signed-off-by: Alan Li <[email protected]>

lialan requested review from hanhanW and pashu123 August 7, 2024 03:47

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 5f35f41 to 29a03ef Compare August 7, 2024 13:13

Max191 reviewed Aug 7, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp Outdated Show resolved Hide resolved

lialan force-pushed the lialan/gpu-data-tiling-encoding branch 3 times, most recently from 644bd0f to d21157b Compare August 8, 2024 03:15

lialan marked this pull request as ready for review August 8, 2024 17:07

lialan requested review from MaheshRavishankar, antiagainst and qedawkins as code owners August 8, 2024 17:07

lialan force-pushed the lialan/gpu-data-tiling-encoding branch 2 times, most recently from a77b3be to cc4fd02 Compare August 12, 2024 20:01

hanhanW force-pushed the shared/gpu-data-tiling-materialize-encoding branch from 6b85ee7 to 9cc6549 Compare August 12, 2024 20:36

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from cc4fd02 to 5881b13 Compare August 12, 2024 21:12

hanhanW requested changes Aug 12, 2024

View reviewed changes

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 5881b13 to a97c0ed Compare August 12, 2024 22:52

lialan requested review from kuhar and Groverkss as code owners August 12, 2024 22:52

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from a97c0ed to 4eb9819 Compare August 13, 2024 00:24

lialan requested review from hanhanW and Max191 August 13, 2024 00:29

hanhanW reviewed Aug 13, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir Outdated Show resolved Hide resolved

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 4eb9819 to c2a08a1 Compare August 13, 2024 01:58

lialan requested a review from hanhanW August 13, 2024 20:34

hanhanW requested changes Aug 13, 2024

View reviewed changes

hanhanW mentioned this pull request Aug 13, 2024

[GPU] Add support for materializing contraction ops. #18196

Merged

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from c2a08a1 to a564e49 Compare August 15, 2024 01:44

lialan requested review from ScottTodd, nithinsubbiah, benvanik, stellaraccident and IanWood1 as code owners August 15, 2024 01:44

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from a564e49 to 7512098 Compare August 15, 2024 02:51

ScottTodd removed their request for review August 15, 2024 16:37

[WIP] add expand_shape to encoding

ce6dcd0

Signed-off-by: Alan Li <[email protected]>

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 7512098 to 873bf06 Compare August 15, 2024 20:36

hanhanW removed request for antiagainst, kuhar, Groverkss, nithinsubbiah, benvanik, stellaraccident and IanWood1 August 15, 2024 21:07

hanhanW approved these changes Aug 15, 2024

View reviewed changes

Add expand_shape to GPU encoding pipeline

5d6cfca

Signed-off-by: Alan Li <[email protected]>

lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 873bf06 to 5d6cfca Compare August 15, 2024 22:10

hanhanW merged commit caf1b2a into iree-org:shared/gpu-data-tiling-materialize-encoding Aug 15, 2024
44 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `expand_shape` to encoding #18135

Add `expand_shape` to encoding #18135

lialan commented Aug 7, 2024

Max191 left a comment

Max191 Aug 7, 2024

lialan Aug 8, 2024

lialan commented Aug 7, 2024

hanhanW commented Aug 7, 2024

hanhanW left a comment

hanhanW Aug 13, 2024

hanhanW commented Aug 15, 2024

hanhanW left a comment

hanhanW Aug 15, 2024

hanhanW Aug 15, 2024

hanhanW Aug 15, 2024

	size_t origRank = origRank = encodingOp.getSourceType().getRank();
	size_t origRank = encodingOp.getSourceType().getRank();

Add expand_shape to encoding #18135

Add expand_shape to encoding #18135

Conversation

lialan commented Aug 7, 2024

Max191 left a comment

Choose a reason for hiding this comment

Max191 Aug 7, 2024

Choose a reason for hiding this comment

lialan Aug 8, 2024

Choose a reason for hiding this comment

lialan commented Aug 7, 2024

hanhanW commented Aug 7, 2024

hanhanW left a comment

Choose a reason for hiding this comment

hanhanW Aug 13, 2024

Choose a reason for hiding this comment

hanhanW commented Aug 15, 2024

hanhanW left a comment

Choose a reason for hiding this comment

hanhanW Aug 15, 2024

Choose a reason for hiding this comment

hanhanW Aug 15, 2024

Choose a reason for hiding this comment

hanhanW Aug 15, 2024

Choose a reason for hiding this comment

Add `expand_shape` to encoding #18135

Add `expand_shape` to encoding #18135