Skip to content

Conversation

justinrosner
Copy link
Contributor

@justinrosner justinrosner commented Oct 6, 2025

Motivation

Currently, CSE is invoked from within MIGraphXToTosaPass and it causes some confusion when looking at the output of mlir-print-if-after-all as it appears that CSE is doing the conversion from MIGraphX -> TOSA.

This implements: https://github.com/ROCm/rocMLIR-internal/issues/2012

Technical Details

Within MIGraphXToTosaPass we were using a nested runPipeline() call to invoke CSE after all of the logic for the MIGraphX -> Tosa conversion had taken place. When we went to use mlir-print-ir-after-all the pass instrumentation correctly prints the IR after each pass, but the nested runPipeline() triggers the IR printing mechanism, which dumps the IR after the CSE pass (which is running on the converted TOSA IR). Once this nested call to runPipeline() finished it would then recognize that MIGraphXToTosa was complete, and then would print the final IR.

The proper thing to do here is move running CSE out into the MIGraphX pipeline.

After this change, when running -mlir-print-ir-after-all we now get the following:

// -----// IR Dump After MIGraphXToTosaPass (migraphx-to-tosa) //----- //
func.func @matmul(%arg0: tensor<262144xf32>, %arg1: tensor<131072xf32>) -> tensor<524288xf32> attributes {arch = "gfx942", kernel} {
  %0 = tosa.const_shape  {values = dense<[256, 512]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %1 = tosa.reshape %arg1, %0 : (tensor<131072xf32>, !tosa.shape<2>) -> tensor<256x512xf32>
  %2 = tosa.const_shape  {values = dense<[1024, 256]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %3 = tosa.reshape %arg0, %2 : (tensor<262144xf32>, !tosa.shape<2>) -> tensor<1024x256xf32>
  %4 = tosa.const_shape  {values = dense<[1, 1024, 256]> : tensor<3xindex>} : () -> !tosa.shape<3>
  %5 = tosa.reshape %3, %4 : (tensor<1024x256xf32>, !tosa.shape<3>) -> tensor<1x1024x256xf32>
  %6 = tosa.const_shape  {values = dense<[1, 256, 512]> : tensor<3xindex>} : () -> !tosa.shape<3>
  %7 = tosa.reshape %1, %6 : (tensor<256x512xf32>, !tosa.shape<3>) -> tensor<1x256x512xf32>
  %8 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
  %9 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
  %10 = tosa.matmul %5, %7, %8, %9 {acc_type = f32} : (tensor<1x1024x256xf32>, tensor<1x256x512xf32>, tensor<1xf32>, tensor<1xf32>) -> tensor<1x1024x512xf32>
  %11 = tosa.const_shape  {values = dense<[1024, 512]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %12 = tosa.reshape %10, %11 : (tensor<1x1024x512xf32>, !tosa.shape<2>) -> tensor<1024x512xf32>
  %13 = tosa.const_shape  {values = dense<524288> : tensor<1xindex>} : () -> !tosa.shape<1>
  %14 = tosa.reshape %12, %13 : (tensor<1024x512xf32>, !tosa.shape<1>) -> tensor<524288xf32>
  return %14 : tensor<524288xf32>
}

// -----// IR Dump After CSE (cse) //----- //
func.func @matmul(%arg0: tensor<262144xf32>, %arg1: tensor<131072xf32>) -> tensor<524288xf32> attributes {arch = "gfx942", kernel} {
  %0 = tosa.const_shape  {values = dense<[256, 512]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %1 = tosa.reshape %arg1, %0 : (tensor<131072xf32>, !tosa.shape<2>) -> tensor<256x512xf32>
  %2 = tosa.const_shape  {values = dense<[1024, 256]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %3 = tosa.reshape %arg0, %2 : (tensor<262144xf32>, !tosa.shape<2>) -> tensor<1024x256xf32>
  %4 = tosa.const_shape  {values = dense<[1, 1024, 256]> : tensor<3xindex>} : () -> !tosa.shape<3>
  %5 = tosa.reshape %3, %4 : (tensor<1024x256xf32>, !tosa.shape<3>) -> tensor<1x1024x256xf32>
  %6 = tosa.const_shape  {values = dense<[1, 256, 512]> : tensor<3xindex>} : () -> !tosa.shape<3>
  %7 = tosa.reshape %1, %6 : (tensor<256x512xf32>, !tosa.shape<3>) -> tensor<1x256x512xf32>
  %8 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1xf32>}> : () -> tensor<1xf32>
  %9 = tosa.matmul %5, %7, %8, %8 {acc_type = f32} : (tensor<1x1024x256xf32>, tensor<1x256x512xf32>, tensor<1xf32>, tensor<1xf32>) -> tensor<1x1024x512xf32>
  %10 = tosa.const_shape  {values = dense<[1024, 512]> : tensor<2xindex>} : () -> !tosa.shape<2>
  %11 = tosa.reshape %9, %10 : (tensor<1x1024x512xf32>, !tosa.shape<2>) -> tensor<1024x512xf32>
  %12 = tosa.const_shape  {values = dense<524288> : tensor<1xindex>} : () -> !tosa.shape<1>
  %13 = tosa.reshape %11, %12 : (tensor<1024x512xf32>, !tosa.shape<1>) -> tensor<524288xf32>
  return %13 : tensor<524288xf32>
}

Test Plan

  • LIT Tests
  • Nightly CI

Test Result

  • LIT Tests
  • Nightly CI

Submission Checklist

Copy link
Contributor

@pabloantoniom pabloantoniom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@justinrosner
Copy link
Contributor Author

@umangyadav umangyadav requested a review from Copilot October 10, 2025 13:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the Common Subexpression Elimination (CSE) pass out of the MIGraphXToTosaPass implementation and into the MIGraphX pipeline to improve debugging clarity with mlir-print-ir-after-all.

  • Removes nested runPipeline() call from MIGraphXToTosaPass that was causing confusing IR output
  • Adds CSE as a separate pass in the MIGraphX pipeline after the TOSA conversion
  • Updates test files to include CSE in their pass sequences

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
mlir/test/rocmlir-driver/pipelines.mlir Updates expected pipeline output to include CSE pass
mlir/test/Conversion/MIGraphXToTosa/mixr-to-tosa-ops.mlir Adds CSE pass to test RUN command
mlir/lib/Dialect/MIGraphX/Pipeline/Pipeline.cpp Adds CSE pass to the high-level MIGraphX pipeline
mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosaPass.cpp Removes nested CSE execution from MIGraphXToTosaPass

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@justinrosner
Copy link
Contributor Author

This is passing the nightly CI run here: https://ml-ci-internal.amd.com/job/MLIR/job/mlir/job/PR-2012/8/pipeline-overview/.

There seems to be an issue where there are currently no gfx942 nodes in the CI which is why this is not finishing.

@justinrosner justinrosner merged commit 743ab25 into develop Oct 10, 2025
15 of 16 checks passed
@justinrosner justinrosner deleted the justinr-after-all branch October 10, 2025 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants