-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expand_shape
to encoding
#18135
Add expand_shape
to encoding
#18135
Conversation
5f35f41
to
29a03ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Are you able to add a unit test for the pass?
// create linalg.transpose | ||
// LHS/RHS: | ||
// OuterTileX x InnerTileX x OuterTileY x InnerTileY | ||
// -> OuterTileY x OuterTileX x InnerTileY x InnerTileX | ||
// (perm = [2, 0, 3, 1]) | ||
// | ||
// ACC: | ||
// OuterTileX x InnerTileX x OuterTileY x InnerTileY | ||
// -> OuterTileX x OuterTileY x InnerTileX x InnerTileY | ||
//(perm = [0, 2, 1, 3]) | ||
ArrayRef<int64_t> permutation; | ||
switch (roleIdx) { | ||
case 0: // A | ||
case 1: // B | ||
permutation = {2, 0, 3, 1}; | ||
break; | ||
case 2: // C | ||
permutation = {0, 2, 1, 3}; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transpose can be different depending on the specific kernel that is being targeted. Can you move the permutation selection to a separate function similar to getIntrinsicVectorSize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. So far it does not take extra arguments to specify which kernel, but we can add it later.
so far not all the components are completed but we can have some lit tests for this particular part. Will add some. |
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
Alan, I think you can test with below MLIR snippet. #pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
#hal.descriptor_set.layout<0, bindings = [
#hal.descriptor_set.binding<0, storage_buffer>,
#hal.descriptor_set.binding<1, storage_buffer>
]>
]>
func.func @set_encoding() {
%c0 = arith.constant 0 : index
%0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<255x513xf32>>
%1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
%2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<255x513xf32>> -> tensor<255x513xf32>
%3 = iree_encoding.set_encoding %2 : tensor<255x513xf32> -> tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>
flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>> -> !flow.dispatch.tensor<writeonly:tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
return
} |
644bd0f
to
d21157b
Compare
a77b3be
to
cc4fd02
Compare
6b85ee7
to
9cc6549
Compare
cc4fd02
to
5881b13
Compare
compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
5881b13
to
a97c0ed
Compare
a97c0ed
to
4eb9819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll review details tomorrow. Just point out one issue in the test.
compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_materialize_encoding.mlir
Outdated
Show resolved
Hide resolved
4eb9819
to
c2a08a1
Compare
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUMaterializeEncoding.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know the outlining trick when I left the comment. Can you update the lit tests to outline encodings? It makes the tests much easier to read and update. E.g., 50f18f1
c2a08a1
to
a564e49
Compare
a564e49
to
7512098
Compare
Hey Alan, I updated the (Perhaps I should not use force-push to remote branch next time. The commits hash is changed because I need to resolve some conflicts.) |
Signed-off-by: Alan Li <[email protected]>
7512098
to
873bf06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks okay, just few nits.
loc, expandShapeType, packOp->getResult(), *reassociationMap); | ||
|
||
// create linalg.transpose on expandShapeShape | ||
size_t origRank = origRank = encodingOp.getSourceType().getRank(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t origRank = origRank = encodingOp.getSourceType().getRank(); | |
size_t origRank = encodingOp.getSourceType().getRank(); |
// CHECK: %[[PACK:.*]] = tensor.pack %2 padding_value(%cst : f32) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 4] into %[[EMPTY]] : tensor<255x513xf32> -> tensor<33x64x16x4xf32> | ||
// CHECK: %[[EXPAND_LHS:.*]] = tensor.expand_shape %[[PACK]] | ||
// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32> | ||
// CHECK: %[[TRANSPOSE:.*]] = linalg.transpose ins(%[[EXPAND_LHS]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to check the transpose permutation as well. It gives us an idea about how the layout looks like after tile swizzling in the lit test.
// CHECK: %[[EXPAND_RHS:.*]] = tensor.expand_shape %[[PACK_RHS]] | ||
// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32> | ||
// CHECK: %[[EMPTY_RHS2:.*]] = tensor.empty() : tensor<33x64x4x16x1x1xf32> | ||
// CHECK: %[[TRANSPOSE_RHS:.*]] = linalg.transpose ins(%[[EXPAND_RHS]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: check the permutation and shape.
Signed-off-by: Alan Li <[email protected]>
873bf06
to
5d6cfca
Compare
caf1b2a
into
iree-org:shared/gpu-data-tiling-materialize-encoding
No description provided.