-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P][AIE2] Split symmetric G_BUILD_VECTOR into two G_BUILD_VECTORs #400
base: aie-public
Are you sure you want to change the base?
Conversation
51b9ad9
to
6d0d5be
Compare
@@ -9,6 +9,29 @@ | |||
//===----------------------------------------------------------------------===// | |||
include "llvm/Target/GlobalISel/Combine.td" | |||
|
|||
def aie_all_combines : GICombineGroup<[trivial_combines, vector_ops_combines, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should state in a comment why we are doing this, like which combine we are excluding and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also post some QoR numbers and if the excluded combines affect any kernels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be frank, I dont get why we need this change. Also, can we move this to a separate commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explicit list the combines that we use, rather than getting an opaque package deal from upstream llvm. That's useful in itself. Any annotation as to why we need a particular combine would add to that value. I hope we can intersperse comment within this tablegen list?
I agree that a separate commit would be good; perhaps even two, one copying the standard list, then removing the ones we don't want. The commit comment can explain why we don't want them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this solution is not scalable/maintainable, what if there are new generic combiner added? Is there no other way to filter the combiner we don't need? @konstantinschwarz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is mentioned on llvm.org, but it is not yet available in LLVM.
def CommonPreLegalizerCombiner: GICombiner<
"CommonGenPreLegalizerCombiner", [all_combines]> {
let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
let Modifiers = [(disable_rule copy_prop)]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that listing the combiners can help to keep the generated code stable. The goal here is to exclude one combiner that undo what we are doing in this new combiner. Also, this can help to avoid surprises in case of a llvm bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I know the goal, just wanted to check if we have a better way to do this (like what is mentioned above, but not available yet). I think we should write comment about the purpose of doing this and also a TODO- to verify the QoR after accommodating new combiner (when we bump to next version).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that <everything> - <what we explicitly don't want>
is any better than <everything>
in the first place. <what we don't want> is usually the result of painstakingly finding out what is bothering us, searching through an unknown bunch of combiners we have just assumed would be doing the right thing. Maintaining the list by trying newly published ones first and checking effects on QoR doesn't scare me as non-scalable.
At the point where we rebase on a new llvm, the less uncontrolled change we have, the better. Once it is up and running it is natural to play with the new toys, updating the combine list based on the release notes, or differences in the generic list.
; CHECK-NEXT: vst wl2, [p0, #0] // Delay Slot 2 | ||
; CHECK-NEXT: nop // Delay Slot 1 | ||
entry: | ||
store <16 x i32> <i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222>, ptr addrspace(6) null, align 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a similar test for each size?
// operand in the second half matches SecondOp. | ||
for (unsigned i = 2; i <= HalfNumElts; i++) { | ||
if (!MI.getOperand(i).isIdenticalTo(FirstOp)) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we match cases when it is not identical to FirstOp but is undef?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if FirstOp itself is undef but all others in each half are identical...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that would call for finding the 'representative' FirstOp or SecondOp on the fly. Since we will do that twice, it's probably a good idea to factor that into a function that checks each of the subranges.
const unsigned NumElts = MI.getNumOperands() - 1; | ||
const unsigned HalfNumElts = NumElts / 2; | ||
const MachineOperand FirstOp = MI.getOperand(1); | ||
const MachineOperand SecondOp = MI.getOperand(HalfNumElts + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have a lambda getSrcOperand(unsigned I) that captures MI, gets rid of the +1 everywhere and renormalizes the for loop below to the standard (unsigned I = 0; I < Num; I++)
form.
; CHECK-NEXT: [[AIE_BROADCAST_VECTOR:%[0-9]+]]:_(<64 x s8>) = G_AIE_BROADCAST_VECTOR [[COPY]](s32) | ||
; CHECK-NEXT: [[AIE_UNPAD_VECTOR:%[0-9]+]]:_(<32 x s8>) = G_AIE_UNPAD_VECTOR [[AIE_BROADCAST_VECTOR]](<64 x s8>) | ||
; CHECK-NEXT: [[AIE_BROADCAST_VECTOR1:%[0-9]+]]:_(<64 x s8>) = G_AIE_BROADCAST_VECTOR [[COPY1]](s32) | ||
; CHECK-NEXT: [[AIE_UNPAD_VECTOR1:%[0-9]+]]:_(<32 x s8>) = G_AIE_UNPAD_VECTOR [[AIE_BROADCAST_VECTOR1]](<64 x s8>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK: There are patterns that absorb the UNPAD_VECTORs (Ultimately we will have 32 x s8 subregs being merged to 64 x s8?)
const LLT DstVecTy = MRI.getType(DstVecReg); | ||
const unsigned DstVecSize = DstVecTy.getSizeInBits(); | ||
|
||
switch (DstVecSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch (DstVecSize) { | |
if (DstVecSize < 256) | |
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And pray no one gives us 384 bit vectors?
if (!MI.getOperand(i).isIdenticalTo(FirstOp)) { | ||
return false; | ||
} | ||
if (!MI.getOperand(HalfNumElts + i).isIdenticalTo(SecondOp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be, no need to satisfy both half needs to be splat vectors to apply this combine, just one is enough. It helps to optimize half. Also, I can see more opportunity with some patterns, for example 3/4th with one element and 1/4th other, literately split into 2 and then into 4 giving us good optimization. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! if we have at least one condition, we already gain in one side.
6d0d5be
to
36549ac
Compare
Working on more generic way to adding more patterns like quarters and single diff splat on parts. |
} | ||
|
||
// If both halves are the same register, it's effectively a splat, and the | ||
// splat vector combine already handles that case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have determined, at cost, that this is a splat vector, I would say there's no need to match this build vector again. Why don't we remove the splat combine, returning the splat transform lambda here?
In general, building vector patterns from subvector patterns is a great way of software reuse.
splat1 = matchSplat(0, halfsize);
splat2 = matchSplat(halfsize, size);
if (splat1 && splat2) {
if (*splat1 == *splat2) {
MatchInfo = Splat(*splat1);
return true;
}
MatchInfo = ConcatSplat(*splat1, *splat2);
return true;
}
return false;
We now have all the splat logic in one place; we might spot that one case is missing, which is the opportunity to implement half of this buildvector as a splat and the rest with the fall back case.
36549ac
to
1bcbfcd
Compare
@@ -1449,7 +1445,8 @@ bool llvm::matchSymmetricBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI, | |||
DstVecTy.changeElementCount( | |||
DstVecTy.getElementCount().divideCoefficientBy(2))); | |||
}; | |||
return true; | |||
|
|||
return (FirstHalfSrcReg.has_value() || SecondHalfSrcReg.has_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better, but we can do more. We have now reused the matching code in the full splat vector match, but we still match twice: once to find two sub-splats and build two smaller BUILD_VECTORS and then once more to properly implement the splat vectors.
Instead, we can remove the original splat vector match and implement all splats here, because we have already done the full match. So if both subvectors are splats and identical, try to generate a full-width broadcast, otherwise if either of them are splats generate a concat with appropriate splat and buildvector.
Please note that creating a concat will hide the original full splat pattern, and we may be lucky that it is tried first, or we are painstakingly recognising the full width splat from the concat pattern.
Please leave this as a follow-up story with a TODO comment in the code.
Splits the symmetric G_BUILD_VECTOR into two half-sized G_BUILD_VECTOR operations and then emits a G_CONCAT_VECTORS to combine them back into the final vector.