Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expand_shape to encoding #18135

Conversation

lialan
Copy link
Contributor

@lialan lialan commented Aug 7, 2024

No description provided.

@lialan lialan requested review from hanhanW and pashu123 August 7, 2024 03:47
@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 5f35f41 to 29a03ef Compare August 7, 2024 13:13
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Are you able to add a unit test for the pass?

Comment on lines 206 to 258
// create linalg.transpose
// LHS/RHS:
// OuterTileX x InnerTileX x OuterTileY x InnerTileY
// -> OuterTileY x OuterTileX x InnerTileY x InnerTileX
// (perm = [2, 0, 3, 1])
//
// ACC:
// OuterTileX x InnerTileX x OuterTileY x InnerTileY
// -> OuterTileX x OuterTileY x InnerTileX x InnerTileY
//(perm = [0, 2, 1, 3])
ArrayRef<int64_t> permutation;
switch (roleIdx) {
case 0: // A
case 1: // B
permutation = {2, 0, 3, 1};
break;
case 2: // C
permutation = {0, 2, 1, 3};
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transpose can be different depending on the specific kernel that is being targeted. Can you move the permutation selection to a separate function similar to getIntrinsicVectorSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. So far it does not take extra arguments to specify which kernel, but we can add it later.

@lialan
Copy link
Contributor Author

lialan commented Aug 7, 2024

Nice work! Are you able to add a unit test for the pass?

so far not all the components are completed but we can have some lit tests for this particular part. Will add some.

@hanhanW
Copy link
Contributor

hanhanW commented Aug 7, 2024

Alan, I think you can test with below MLIR snippet.

#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
  #hal.descriptor_set.layout<0, bindings = [
    #hal.descriptor_set.binding<0, storage_buffer>,
    #hal.descriptor_set.binding<1, storage_buffer>
  ]>
]>
func.func @set_encoding() {
  %c0 = arith.constant 0 : index
  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<255x513xf32>>
  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
  %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<255x513xf32>> -> tensor<255x513xf32>
  %3 = iree_encoding.set_encoding %2 : tensor<255x513xf32> -> tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>
  flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [255, 513], strides = [1, 1] : tensor<255x513xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>> -> !flow.dispatch.tensor<writeonly:tensor<255x513xf32,  #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<255x513xf32>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>>>
  return
}

@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch 3 times, most recently from 644bd0f to d21157b Compare August 8, 2024 03:15
@lialan lialan marked this pull request as ready for review August 8, 2024 17:07
@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch 2 times, most recently from a77b3be to cc4fd02 Compare August 12, 2024 20:01
@hanhanW hanhanW force-pushed the shared/gpu-data-tiling-materialize-encoding branch from 6b85ee7 to 9cc6549 Compare August 12, 2024 20:36
@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from cc4fd02 to 5881b13 Compare August 12, 2024 21:12
@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 5881b13 to a97c0ed Compare August 12, 2024 22:52
@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from a97c0ed to 4eb9819 Compare August 13, 2024 00:24
@lialan lialan requested review from hanhanW and Max191 August 13, 2024 00:29
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll review details tomorrow. Just point out one issue in the test.

@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 4eb9819 to c2a08a1 Compare August 13, 2024 01:58
@lialan lialan requested a review from hanhanW August 13, 2024 20:34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know the outlining trick when I left the comment. Can you update the lit tests to outline encodings? It makes the tests much easier to read and update. E.g., 50f18f1

@hanhanW
Copy link
Contributor

hanhanW commented Aug 15, 2024

Hey Alan, I updated the shared/gpu-data-tiling-materialize-encoding branch few days ago. I'm seeing some unrelated changes in the PR. Can you rebase your PR on top of that? I think the main reason is that you have my additional commits in your branch. You should be able to rebase on the remote branch and drop your local three commits from me.

(Perhaps I should not use force-push to remote branch next time. The commits hash is changed because I need to resolve some conflicts.)

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks okay, just few nits.

loc, expandShapeType, packOp->getResult(), *reassociationMap);

// create linalg.transpose on expandShapeShape
size_t origRank = origRank = encodingOp.getSourceType().getRank();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t origRank = origRank = encodingOp.getSourceType().getRank();
size_t origRank = encodingOp.getSourceType().getRank();

// CHECK: %[[PACK:.*]] = tensor.pack %2 padding_value(%cst : f32) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 4] into %[[EMPTY]] : tensor<255x513xf32> -> tensor<33x64x16x4xf32>
// CHECK: %[[EXPAND_LHS:.*]] = tensor.expand_shape %[[PACK]]
// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32>
// CHECK: %[[TRANSPOSE:.*]] = linalg.transpose ins(%[[EXPAND_LHS]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to check the transpose permutation as well. It gives us an idea about how the layout looks like after tile swizzling in the lit test.

// CHECK: %[[EXPAND_RHS:.*]] = tensor.expand_shape %[[PACK_RHS]]
// CHECK-SAME: output_shape [33, 64, 16, 1, 4, 1] : tensor<33x64x16x4xf32> into tensor<33x64x16x1x4x1xf32>
// CHECK: %[[EMPTY_RHS2:.*]] = tensor.empty() : tensor<33x64x4x16x1x1xf32>
// CHECK: %[[TRANSPOSE_RHS:.*]] = linalg.transpose ins(%[[EXPAND_RHS]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: check the permutation and shape.

@lialan lialan force-pushed the lialan/gpu-data-tiling-encoding branch from 873bf06 to 5d6cfca Compare August 15, 2024 22:10
@hanhanW hanhanW merged commit caf1b2a into iree-org:shared/gpu-data-tiling-materialize-encoding Aug 15, 2024
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants