Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NpuSyncOp generation to AIEDmaToNpu #1114

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Add NpuSyncOp generation to AIEDmaToNpu #1114

wants to merge 18 commits into from

Conversation

AndraBisca
Copy link
Collaborator

@AndraBisca AndraBisca commented Mar 8, 2024

Designs that use objectfifos can rely on the ShimDMAAllocationOps generated by the objectfifo lowering to produce the corresponding NpuSyncOps. Other designs will still require that the sync be added explicitly.

This PR works in relation with the MLIR-AIR channel-to-objectfifo lowering to ensure that the NpuSyncOps are added at the MIR-AIE level, after the mapping decisions done by the objectififo lowering.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

Coverage Report

Created: 2024-06-04 22:05

Click here for information about interpreting this report.

FilenameFunction CoverageLine CoverageRegion CoverageBranch Coverage
home/runner/work/mlir-aie/mlir-aie/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp 100.00% 95.86% 90.65% 80.56%
Totals 100.00% 95.86% 90.65% 80.56%
Generated by llvm-cov -- llvm version 14.0.0

AndraBisca and others added 4 commits March 8, 2024 16:25
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
abisca and others added 4 commits June 4, 2024 15:03
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@AndraBisca AndraBisca marked this pull request as ready for review June 4, 2024 22:04
@fifield fifield changed the title Add IpuSyncOp generation to AIEDmaToIpu Add NpuSyncOp generation to AIEDmaToNpu Jun 5, 2024
Comment on lines +441 to +458
for (auto dma : dmas) {
if (auto infoOp =
getAllocOpForSymbol(shimDmaAllocOps, dma.getMetadata())) {
if (infoOp->getChannelDir() == AIE::DMAChannelDir::S2MM) {
// Found dma op copying results to host
OpBuilder builder(dma);
auto col = builder.getI32IntegerAttr(infoOp->getCol());
auto row = builder.getI32IntegerAttr(0);
auto dir = builder.getI32IntegerAttr(0);
auto chan = builder.getI32IntegerAttr(infoOp->getChannelIndex());
auto col_num = builder.getI32IntegerAttr(1);
auto row_num = builder.getI32IntegerAttr(1);
builder.setInsertionPoint(returnOp);
builder.create<AIEX::NpuSyncOp>(dma->getLoc(), col, row, dir, chan,
col_num, row_num);
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to unconditionally add a sync to every outgoing memcpy. For example, what if we are collecting N output tiles at the shim and only need to sync at the end? The N-1 extraneous syncs will have a performance penalty vs. the single (manually inserted) sync at the end.

@@ -59,7 +59,6 @@ def sequence(A, B, C):
npu_dma_memcpy_nd(
metadata="in", bd_id=1, mem=A, sizes=[1, K, M, 1], strides=[1, 1, K]
)
npu_sync(column=0, row=0, direction=0, channel=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that the tests keep the sync explicit, but using aiex.npu.dma_wait instead of aiex.npu.sync

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that the tests keep the sync explicit, but using aiex.npu.dma_wait instead of aiex.npu.sync

Should we close this PR in favor of #1791 ? Or is there still a desire to insert the sync/wait automatically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants