[Reprogram] Modify controlcode-lowering and lower-to-aie for DMA reprogramming #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Abhishek-Varma wants to merge 1 commit into main from avarma_cc_lowering_reprogram

Contributor

Abhishek-Varma commented Jul 15, 2025

-- This commit includes modifications to controlcode-lowering and
lower-to-aie passes for DMA reprogramming.

-- This is being added to AMDAIE dialect to make DMA reprogramming work.

Signed-off-by: Abhishek Varma [email protected]


          [Reprogram] Modify controlcode-lowering and lower-to-aie for DMA repr…

da50891

…ogramming

-- This commit includes modifications to `controlcode-lowering` and
   `lower-to-aie` passes for DMA reprogramming.

-- This is being added to AMDAIE dialect to make
   [DMA reprogramming](#1287) work.

Signed-off-by: Abhishek Varma <[email protected]>

Abhishek-Varma requested review from MaheshRavishankar, yzhang93, jtuyls, newling and Yu-Zhewen as code owners

July 15, 2025 11:19

Abhishek-Varma commented

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/Utils/AMDAIEDmaUtils.h

+              /// Utility to fold the provided repetition count, unit dims, linear dims and
+              /// to convert the sizes and strides into static versions and return them.
+              LogicalResult foldDimsAndReturnAsStatic(

Contributor Author

Abhishek-Varma Jul 15, 2025

A few fairly trivial utilities to be pulled out to a common place.
There are a few other utilities which are a bit involved and can be addressed for de-duplication later.

jtuyls requested changes

View reviewed changes

compiler/plugins/target/AMD-AIE/iree-amd-aie/IR/AMDAIEOps.cpp

Comment on lines +497 to +503

+                if (npuDmaUsers.size() > 1) {
                   return emitOpError() << "only a single `amdaie.npu.circular_dma_cpy_nd` "
                                           "user supported currently, but got: "
                                        << npuDmaUsers.size();
                 }
-                return npuDmaUsers[0];
+                if (npuDmaUsers.size() == 1) return npuDmaUsers[0];
+                return failure();

Collaborator

jtuyls Jul 16, 2025

This change doesn't look needed?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+              //               nuances.
+              using BDDimLayoutAndLength = std::pair<AMDAIE::BDDimLayoutArrayAttr, int64_t>;
+              BDDimLayoutAndLength convertSizeStrideToBDDimLayoutArrayAttr(

Collaborator

jtuyls Jul 16, 2025

Isn't this function exactly the same, so it can be extracted?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+                  std::optional<uint8_t> pktId) {
+                OpBuilder::InsertionGuard g(rewriter);
+                // Create DMA channel.

Collaborator

jtuyls Jul 16, 2025

I know this is copied, but maybe update this comment to "Create DMA start on channel".

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+              // AIEDeviceBuilder utilities
+              //===----------------------------------------------------------------------===//
+              // TODO(avarma): Copied from LowerToAIE. Templatize it later because of a few
+              //               nuances.

Collaborator

jtuyls Jul 16, 2025

Could you specify the nuances?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

Comment on lines +452 to +506

+                // Find the last index with a zero stride. All dimensions before and including
+                // this one will be converted into separate DMA ops, while the dimensions
+                // after this will be included in the access pattern within a DMA op. This is
+                // needed becaused low-level DMA BD configurations currently don't support
+                // zero stride and/or because more dimensions are needed than available.
+                int64_t lastZeroStrideIndex{-1};
+                for (size_t i = 0; i < strides.size(); i++)
+                  if (strides[i] == 0) lastZeroStrideIndex = i;
+                // Convert all dimensions after the last index with zero stride to a
+                // `BDDimLayoutArrayAttr` as these are the inner/intra DMA dimensions.
+                auto [dims, transferLength] = convertSizeStrideToBDDimLayoutArrayAttr(
+                    rewriter, ArrayRef<int64_t>(sizes).drop_front(lastZeroStrideIndex + 1),
+                    ArrayRef<int64_t>(strides).drop_front(lastZeroStrideIndex + 1));
+                SmallVector<size_t> indexRange(lastZeroStrideIndex + 1);
+                std::iota(indexRange.begin(), indexRange.end(), 0);
+                // Compute the total number of iterations of all dimensions up till
+                // `lastZeroStrideIndex`.
+                int64_t numIters = std::accumulate(
+                    sizes.begin(), sizes.begin() + indexRange.size(), 1, std::multiplies<>());
+                // Compute the divisors to be used to get the indices for every dimension from
+                // the total number of iterations (as if all dimensions are coalesced).
+                SmallVector<int64_t> cartesianDivisors(indexRange.size(), 1);
+                for (int64_t i = indexRange.size() - 2; i >= 0; i--)
+                  cartesianDivisors[i] = cartesianDivisors[i + 1] * sizes[i + 1];
+                // Create blocks with DMA ops.
+                Block *succ = nullptr, *curr = bdBlock;
+                for (size_t blockIndex = 0; blockIndex < bufferOps.size(); ++blockIndex) {
+                  // Iterate through the cartesian product of all dimension up to the last
+                  // dimension with zero strides to create a DMA chain of `dma_bd` ops.
+                  for (int64_t index = 0; index < numIters; index++) {
+                    SmallVector<int64_t> indices = llvm::map_to_vector(
+                        indexRange,
+                        [&](size_t i) { return (index / cartesianDivisors[i]) % sizes[i]; });
+                    bool isFirst = llvm::all_of(indices, [](int64_t v) { return v == 0; });
+                    bool isLast = llvm::all_of(
+                        indexRange, [&](size_t i) { return indices[i] == (sizes[i] - 1); });
+                    if (blockIndex == bufferOps.size() - 1 && isLast) {
+                      succ = &endBlock;
+                    } else {
+                      succ = rewriter.createBlock(&endBlock);
+                      llvm::outs() << "8 = " << dmaStartOp << "\n";
+                    }
+                    rewriter.setInsertionPointToStart(curr);
+                    int64_t addOffset = 0;
+                    for (size_t i = 0; i < indexRange.size(); i++)
+                      addOffset += (indices[i] * strides[i]);
+                    createDMAOps(succ, bufferOps[blockIndex], dims, isFirst, isLast,
+                                 transferLength, offset + addOffset);
+                    curr = succ;
+                  }
+                }
+                return success();

Collaborator

jtuyls Jul 16, 2025

Unless there is some difference here, I think this could be extracted.

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

Comment on lines +748 to +755

+                      if (!bdIdOp) {
+                        if (failed(halfDmaToDmaStartBlocks(rewriter, deviceModel, halfDmaOp,
+                                                           controlCodeOp, connectionIndex,
+                                                           tileToMemOpMap))) {
+                          return WalkResult::interrupt();
+                        }
+                        eraseCandidate(halfDmaOp);
+                      }

Collaborator

jtuyls Jul 16, 2025

Avoid the nesting:

Suggested change

      
                    if (!bdIdOp) {
          
                      if (failed(halfDmaToDmaStartBlocks(rewriter, deviceModel, halfDmaOp,
          
                                                         controlCodeOp, connectionIndex,
          
                                                         tileToMemOpMap))) {
          
                        return WalkResult::interrupt();
          
                      }
          
                      eraseCandidate(halfDmaOp);
          
                    }
          
                    if (bdIdOp) return WalkResult::advance();
          
                    if (failed(halfDmaToDmaStartBlocks(rewriter, deviceModel, halfDmaOp,
          
                                                         controlCodeOp, connectionIndex,
          
                                                         tileToMemOpMap))) {
          
                        return WalkResult::interrupt();
          
                      }
          
                      eraseCandidate(halfDmaOp);

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+                    });
+                if (res.wasInterrupted()) return failure();
+                for (Operation *op : toBeErased) rewriter.eraseOp(op);

Collaborator

jtuyls Jul 16, 2025

You're going through toBeErased again?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+                  return success();
+                };
+                auto processTarget =

Collaborator

jtuyls Jul 16, 2025

Could you move processSource and processTarget into standalone functions?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIEControlCodeLowering.cpp

+                  return connectionOp.emitOpError()
+                         << "expected target to be an logical objFifo-like op";
+                }
+                // TODO(avarma): Need to set correct insertion point.

Collaborator

jtuyls Jul 16, 2025

Still an issue?

compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIELowerToAIE.h

@@ @@ -127,6 +122,7 @@ class AIEDeviceBuilder { @@
                     connectionToSourceTargetMemOps;
                 /// Map from connection ops to the flow ops they have been converted into.
                 DenseMap<AMDAIE::ConnectionOp, SmallVector<Operation *>> connectionToFlowOps;
+                bool reprogramDmas;

Collaborator

jtuyls Jul 16, 2025

Could you add some doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jtuyls jtuyls requested changes

MaheshRavishankar Awaiting requested review from MaheshRavishankar MaheshRavishankar is a code owner

yzhang93 Awaiting requested review from yzhang93 yzhang93 is a code owner

newling Awaiting requested review from newling newling is a code owner

Yu-Zhewen Awaiting requested review from Yu-Zhewen Yu-Zhewen is a code owner

Requested changes must be addressed to merge this pull request.

Labels

None yet