-
Couldn't load subscription status.
- Fork 29
[AIE2P] Allow more address chaining and add a profitability check heuristic #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
…ristic This allows chaining when the base pointer is used in other basic blocks but only when it is considered profitable: - This shouldn't happen in a loop as the resulting copy will be more costly - The cost of chaining is incremented for each offset falling outside the load/store immediate ranges. - An experimental threshold is used to determine if chaining is profitable based on the compute cost (e.g half the number of pointer adds to be chained)
50a42df to
5230bcf
Compare
| return true; | ||
| } else { | ||
| if (MemType.getSizeInBits() <= 32) { | ||
| ImmediateRangeMax = TII->getLoadStorePostIncImmediateRange(MemType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one query to TII is enough.
| // assume chaining is always profitable. | ||
| if (MemType.isVector()) { | ||
| return true; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove else, it will reduce the indentation.
|
|
||
| // If the immediate range is not set, the pointers aren't used by any | ||
| // loads and stores, so we return. | ||
| if (!ImmediateRangeSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dead code? This covered by:
llvm_unreachable(
"unreachable: Unsupported immediate range of scalar size ");
Should we reuse that else to just return false?
| continue; | ||
| } | ||
|
|
||
| const int64_t CurrOffset = OffsetMI->Value.getSExtValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could include some description of the idea (also as a future reference).
| // assume chaining is always profitable. | ||
| if (MemType.isVector()) { | ||
| return true; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can assert MemType.getSizeInBits() <= 32 and remove the next if as well.
| int64_t ImmediateRangeMax; | ||
| int64_t ImmediateRangeMin; | ||
| }; | ||
| virtual ImmediateRangeBounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply use std::pair?
| "aie-chain-addr-vec-ldst", cl::Hidden, cl::init(true), | ||
| cl::desc("Enable ptradd chaining for vector loads and stores.")); | ||
|
|
||
| cl::opt<int> AddressChainCostLimit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we make this static?
| // If the base reg is used in any of the successive MBBs, would introduce a | ||
| // COPY and increase reg pressure. We only skip chaining in this case if it | ||
| // is considered unprofitable. | ||
| if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move isRegUsedInSuccessiveMBBs inside profitability check?
| ChainedCostLimit = AddressChainCostLimit; | ||
| } | ||
|
|
||
| if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking isRegUsedInSuccessiveMBBs already right?
| assert(Instrs.size() > 1); | ||
|
|
||
| bool InLoop = true; | ||
| MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use AIELoopUtils::isSingleMBBLoop, then no need to include MachineLoopInfo
| return false; | ||
| } | ||
|
|
||
| auto OffsetMI = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: OffsetValue may be.
| # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| # | ||
| # (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates | ||
| # RUN: llc -mtriple aie2p -start-before=aie-cluster-base-address -stop-after=postmisched --issue-limit=6 --aie-chain-addr-scl-ldst=true %s -verify-machineinstrs -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arent --issue-limit=6 --aie-chain-addr-scl-ldst=true the default options?
| Offsets.push_back( | ||
| (I == 0 || (std::holds_alternative<std::string>(Offsets.back()) && | ||
| std::get<std::string>(Offsets.back()) == "Break")) | ||
| ? CurrOffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like CurrOffset is an integer value.
|
|
||
| NewOffset = NextOffset - AccumulatedOffset; | ||
|
|
||
| if (NewOffset < ImmediateRangeMin || NewOffset > ImmediateRangeMax) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check: are we supposed to check AccumulatedOffset with immediate range right?
| return {14, -16}; | ||
| else if (MemType.getSizeInBits() <= 32) | ||
| return {28, -32}; | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have memory type for s128 as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gracefully exit, rather than asserting?
| unsigned AIE2InstrInfo::getMaxSupportedLdStIncSize() const { return 512; } | ||
|
|
||
| AIEBaseInstrInfo::ImmediateRangeBounds | ||
| AIE2InstrInfo::getLoadStorePostIncImmediateRange(LLT MemType) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open up a PR, where i have the Immediate Range checks for all the MemorySizes.
You are missing the vector Sizes checks and the checks for different Loads, such as G_ZEXT_LOAD, which are pitiful small.
[AutoBump] Merge with fixes of 0a17bdf (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes)(Needs Torch Bump)
This allows chaining when the base pointer is used in other basic blocks but only when it is considered profitable:
Below QoR results for affected kernels, the rest are unchanged:
Stack Size is unchanged in all kernels