-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Propagate MMO for loads without PTR #198
Conversation
if (MO.isReg() && MO.getReg().isVirtual()) { | ||
const Register Reg = MO.getReg(); | ||
if (const MachineInstr *RegDef = MRI.getVRegDef(Reg)) | ||
findGlobalValues(*RegDef, GVSet, VisitedInstrs, 10, MRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps put the 10 in a command line option right away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursion depth is an explosive parameter. The operands you are traversing may branch quite heavily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a command line we can control this now!
} | ||
} | ||
|
||
// Add gattered GlobalValues as MMOs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gathered
} | ||
|
||
// Add gattered GlobalValues as MMOs. | ||
for (auto GV : GVSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should know whether we completed the search, otherwise we may be adding an incomplete set, which can cause over-optimistic alias analysis
@@ -472,6 +539,12 @@ bool AIEPostSelectOptimize::runOnMachineFunction(MachineFunction &MF) { | |||
} | |||
} | |||
|
|||
// 4. Fix MMO for load instructions that don't use pointers | |||
// registers (use vector instead, for example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it adds MMOs for loads that don't already have them, irrespective of whether they are pointers or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the comment accordingly.
for (auto MO : MI.uses()) { | ||
if (MO.isGlobal()) { | ||
GVSet.insert(MO.getGlobal()); | ||
} else if (MO.isReg() && MO.getReg().isVirtual()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should recognize the regular address modifiers here and stop recursion. You don't want to descend into integer operands of e.g. GEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @martien-de-jong, as we are in MIR now, all GEPs were translated to ptradd, mul and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GEP, PTRADD, not much different. PTR_ADD only leads to pointers in its first operand, not in the second, so searching the second is wasted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if the first operand is null (or something like G_INTTOPTR(G_CONSTANT)) and second is an integer derived from a G_PTRTOINT (or something like G_PTRTOINT(GlobalValue))? Pointer arithmetic can be crazy... like:
body: |
bb.1.entry:
%1:_(p0) = G_CONSTANT i20 4
%4:_(p0) = G_GLOBAL_VALUE @Global
%3:_(s20) = G_PTRTOINT %4(p0)
%7:_(s20) = G_CONSTANT i20 2
%5:_(s20) = G_SHL %3, %7(s20)
%6:_(p0) = G_PTR_ADD %1, %5(s20)
$p0 = COPY %6(p0)
PseudoRET implicit $lr, implicit $p0
...
In this example we can reach the pointer though the second operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind if those programmers were punished...
But seriously, if we descend through any operator, the recursion depth is not nicely modelling processing effort. I would rather have somethin linear like number of visited nodes. And you have that already available as the size of the visited set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, the recursion limit was replaced by the number of visited instructions.
I don't think it's important, but I guess that llvm will also consider the constant table addresses as leaked (!nocapture-like) |
0204d3c
to
f88c960
Compare
Humm, I am trying to understand how this can affect us.... |
QoR results:
|
f88c960
to
3544996
Compare
Just for future: We've seen cases where the Globals themselves don't have an address space, but they are casted to a pointer of the right AS. I guess the first AS we encounter coming down from the load is the one intended by the programmer. I also think we can trust standard AA to follow that pointer cast down to the Global, so we could also use it as address for the MMO. |
4e7ce06
to
3462674
Compare
3462674
to
093ac91
Compare
static_cast<const AIEBaseInstrInfo *>(ST.getInstrInfo()); | ||
|
||
// Skip subvector copy. | ||
if (MI->isCopy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I guess there could be multiple copies. Isn't there a helper function somewhere to look through copies?
if (LoadMMODeepSearch) | ||
// Look to everything! | ||
Success &= findGlobalValues(RegDef, GVSet, VisitedInstrs, MRI, | ||
LoadMMOSearchLimit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid committing the findGlobalValues
as it is overly optimistic. Maybe keep it on a separate branch for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say that this is optimistic or not, I can say that this is harder to break if we consider the pattern recognition. It tracks everything (in terms of globals) on the way of VLDB.4X load, it only rejects when it reaches function parameters (because they can be pointers as well). For AA perspective, if we miss one MMO is problem. On the other hand, if you include non-used ones, there is no problem because it will lead to more dependencies (just performance impacts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main "issue" with it is that it considers loads safe, while I think they are only safe when we have already found a "vector" of pointers. In that case, it does not matter what we add to the vector, it could be a constant vector, or some loaded quantity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version addresses this situation: we need to find a select(bcast(C1), bcast(C1), mask)
somewhere in the chain up starting form the load.
08af3de
to
d775df6
Compare
MachineRegisterInfo &MRI) { | ||
|
||
if (MI->isCopy()) | ||
MI = getDefIgnoringCopiesAndBitcasts(MI->getOperand(1).getReg(), MRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Why not unconditionally call MI = getDefIgnoringCopiesAndBitcasts(MI->getOperand(1).getReg(), MRI);
?
d775df6
to
5cc8dc9
Compare
Hi @gbossu, now I hope we converged ;-) It is also possible to SWP (pre) Softmax with: |
|
||
for (MachineInstr &MI : MBB) { | ||
// We want to analyze just loads without post increment. | ||
if (!MI.mayLoad() || !MI.memoperands_empty() || MI.getNumOperands() != 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an attempt to recognise a multi-way load in an architecture-neutral way. Why not add that to the set of abstract operations?
return false; | ||
} | ||
return UnknownSide /*offset side is found*/ && | ||
KnownSide /*broadcasted side is found*/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: KnownSide tells us that operand can be traced wholly to a set of objects. UnknownSide might be anything, but cannot introduce new objects without also introducing undefined behaviour. I have a feeling we only need to check KnownSide. Moreover, I would not use the term 'broadcasted side'. We may later generalize to other ways to construct a vector of object designations. 'object side' rhymes better with 'offset side'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: you couldn't directly assign to KnownSide anymore, instead do a sticky update if the visit function returns true.
KnownSide = visitVAdd(*VOp, TII, MRI, GVSet); | ||
else if (VOp->Type == Operation::SELECT) | ||
KnownSide = visitVSelect(*VOp, TII, MRI, GVSet); | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could also deal with BROADCAST here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By correctness yes. But, due to architectural restrictions (odd-even memory banks), I believe that this cannot represent a real case. A final pointer must be constructed with at least two broadcasts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select (add(broadcast, offset), add(broadcast,offset)) would fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be, however I prefer to only include already existing cases here. As discussed before, we can include any new pattern on demand.
// from SELECT or BROADCAST. | ||
if (!VOp) | ||
return false; | ||
if (VOp->Type == Operation::SELECT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess visitXXX could return false if VOp isn't XXX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this could represent a implementation mistake, I inserted asserts.
if (!MI.mayLoad() || !MI.memoperands_empty() || MI.getNumOperands() != 2) | ||
continue; | ||
SmallPtrSet<const Value *, 4> GVSet; | ||
if (traverseVecPointerChain(MI.getOperand(1).getReg(), GVSet, TII, MRI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Operand(1) may not be a register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we parse the abstract load now, we started to have this guarantee.
@@ -441,6 +441,20 @@ struct AIEBaseInstrInfo : public TargetInstrInfo { | |||
llvm_unreachable("Target didn't implement isProfitableToSplitType!"); | |||
} | |||
|
|||
/// Abstract vector operation to help the decoding of complex operations. | |||
struct AbstractVecOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could also use this for non-vector ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed accordingly.
if (MO.isGlobal()) { | ||
GVSet.insert(MO.getGlobal()); | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MI could be anything, e.g. the top of a tree of selects. I guess we have to return false whenever MO isn't a global and collect all used globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe any operand that cannot be tracked as a memory location should make us return false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Now we consider only the case with one src operand that loads a global.
New QoR results (updated):
|
2e69d96
to
e176158
Compare
|
||
// Recognize ADD ((ADD | SELECT), Unknown) | ||
bool visitVAdd(const AbstractOp &VAdd, const AIEBaseInstrInfo *TII, | ||
MachineRegisterInfo &MRI, SmallPtrSet<const Value *, 4> &GVSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that function be equivalent to something like:
bool visitAddressVector(const AbstractOp &Op, const AIEBaseInstrInfo *TII,
MachineRegisterInfo &MRI, SmallPtrSet<const Value *, 4> &GVSet) {
if (Op.Type == Operation::VECTOR_SELECT)
return visitSelect(...);
if (Op.Type != Operation::VECTOR_ADD)
return false;
return any_of(Op.VectorSrcRegs, [&](Register SrcReg) {
OptionalOp VOp = TII->parseAbstractOp(*MRI.getVRegDef(Reg));
return !VOp || visitAddressVector(*VOp, TII, MRI, GVSet);
};
}
91ef063
to
460287b
Compare
|
||
return any_of(VOp.VectorSrcRegs, [&](Register SrcReg) { | ||
OptionalOp VOp = TII->parseAbstractOp(*MRI.getVRegDef(SrcReg)); | ||
return !VOp || visitAddressVector(*VOp, TII, MRI, GVSet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I made a mistake when suggesting that cod. That should probably be VOp && visitAddressVector(*VOp, TII, MRI, GVSet)
. I.e. we are fine as long as one of the source registers could be successfully traced to an address vector. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one can be traced back, we are fine! I will update.
460287b
to
aab6fdc
Compare
This can prevent MachinePipeliner from considering some loads as barriers (isDependenceBarrier).
aab6fdc
to
b7e280d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for looking into all the comments :)
This can prevent MachinePipeliner from considering some loads as barriers (isDependenceBarrier).