Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P][AIE2] Split symmetric G_BUILD_VECTOR into two G_BUILD_VECTORs #400

Open
wants to merge 2 commits into
base: aie-public
Choose a base branch
from

Conversation

abhinay-anubola
Copy link
Collaborator

Splits the symmetric G_BUILD_VECTOR into two half-sized G_BUILD_VECTOR operations and then emits a G_CONCAT_VECTORS to combine them back into the final vector.

@@ -9,6 +9,29 @@
//===----------------------------------------------------------------------===//
include "llvm/Target/GlobalISel/Combine.td"

def aie_all_combines : GICombineGroup<[trivial_combines, vector_ops_combines,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should state in a comment why we are doing this, like which combine we are excluding and why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also post some QoR numbers and if the excluded combines affect any kernels?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be frank, I dont get why we need this change. Also, can we move this to a separate commit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicit list the combines that we use, rather than getting an opaque package deal from upstream llvm. That's useful in itself. Any annotation as to why we need a particular combine would add to that value. I hope we can intersperse comment within this tablegen list?
I agree that a separate commit would be good; perhaps even two, one copying the standard list, then removing the ones we don't want. The commit comment can explain why we don't want them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this solution is not scalable/maintainable, what if there are new generic combiner added? Is there no other way to filter the combiner we don't need? @konstantinschwarz ?

Copy link
Collaborator Author

@abhinay-anubola abhinay-anubola Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is mentioned on llvm.org, but it is not yet available in LLVM.

def CommonPreLegalizerCombiner: GICombiner<
      "CommonGenPreLegalizerCombiner", [all_combines]> {
      let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
      let Modifiers = [(disable_rule copy_prop)]
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that listing the combiners can help to keep the generated code stable. The goal here is to exclude one combiner that undo what we are doing in this new combiner. Also, this can help to avoid surprises in case of a llvm bump.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I know the goal, just wanted to check if we have a better way to do this (like what is mentioned above, but not available yet). I think we should write comment about the purpose of doing this and also a TODO- to verify the QoR after accommodating new combiner (when we bump to next version).

Copy link
Collaborator

@martien-de-jong martien-de-jong Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that <everything> - <what we explicitly don't want> is any better than <everything> in the first place. <what we don't want> is usually the result of painstakingly finding out what is bothering us, searching through an unknown bunch of combiners we have just assumed would be doing the right thing. Maintaining the list by trying newly published ones first and checking effects on QoR doesn't scare me as non-scalable.
At the point where we rebase on a new llvm, the less uncontrolled change we have, the better. Once it is up and running it is natural to play with the new toys, updating the combine list based on the release notes, or differences in the generic list.

; CHECK-NEXT: vst wl2, [p0, #0] // Delay Slot 2
; CHECK-NEXT: nop // Delay Slot 1
entry:
store <16 x i32> <i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 111, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222, i32 222>, ptr addrspace(6) null, align 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a similar test for each size?

// operand in the second half matches SecondOp.
for (unsigned i = 2; i <= HalfNumElts; i++) {
if (!MI.getOperand(i).isIdenticalTo(FirstOp)) {
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we match cases when it is not identical to FirstOp but is undef?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if FirstOp itself is undef but all others in each half are identical...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that would call for finding the 'representative' FirstOp or SecondOp on the fly. Since we will do that twice, it's probably a good idea to factor that into a function that checks each of the subranges.

const unsigned NumElts = MI.getNumOperands() - 1;
const unsigned HalfNumElts = NumElts / 2;
const MachineOperand FirstOp = MI.getOperand(1);
const MachineOperand SecondOp = MI.getOperand(HalfNumElts + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a lambda getSrcOperand(unsigned I) that captures MI, gets rid of the +1 everywhere and renormalizes the for loop below to the standard (unsigned I = 0; I < Num; I++) form.

; CHECK-NEXT: [[AIE_BROADCAST_VECTOR:%[0-9]+]]:_(<64 x s8>) = G_AIE_BROADCAST_VECTOR [[COPY]](s32)
; CHECK-NEXT: [[AIE_UNPAD_VECTOR:%[0-9]+]]:_(<32 x s8>) = G_AIE_UNPAD_VECTOR [[AIE_BROADCAST_VECTOR]](<64 x s8>)
; CHECK-NEXT: [[AIE_BROADCAST_VECTOR1:%[0-9]+]]:_(<64 x s8>) = G_AIE_BROADCAST_VECTOR [[COPY1]](s32)
; CHECK-NEXT: [[AIE_UNPAD_VECTOR1:%[0-9]+]]:_(<32 x s8>) = G_AIE_UNPAD_VECTOR [[AIE_BROADCAST_VECTOR1]](<64 x s8>)
Copy link
Collaborator

@martien-de-jong martien-de-jong Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: There are patterns that absorb the UNPAD_VECTORs (Ultimately we will have 32 x s8 subregs being merged to 64 x s8?)

const LLT DstVecTy = MRI.getType(DstVecReg);
const unsigned DstVecSize = DstVecTy.getSizeInBits();

switch (DstVecSize) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch (DstVecSize) {
if (DstVecSize < 256)
return false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And pray no one gives us 384 bit vectors?

if (!MI.getOperand(i).isIdenticalTo(FirstOp)) {
return false;
}
if (!MI.getOperand(HalfNumElts + i).isIdenticalTo(SecondOp)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be, no need to satisfy both half needs to be splat vectors to apply this combine, just one is enough. It helps to optimize half. Also, I can see more opportunity with some patterns, for example 3/4th with one element and 1/4th other, literately split into 2 and then into 4 giving us good optimization. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! if we have at least one condition, we already gain in one side.

@abhinay-anubola abhinay-anubola force-pushed the sanubola.symmetric.build.vector branch from 6d0d5be to 36549ac Compare March 26, 2025 05:55
@abhinay-anubola
Copy link
Collaborator Author

Working on more generic way to adding more patterns like quarters and single diff splat on parts.

}

// If both halves are the same register, it's effectively a splat, and the
// splat vector combine already handles that case.
Copy link
Collaborator

@martien-de-jong martien-de-jong Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have determined, at cost, that this is a splat vector, I would say there's no need to match this build vector again. Why don't we remove the splat combine, returning the splat transform lambda here?
In general, building vector patterns from subvector patterns is a great way of software reuse.

splat1 = matchSplat(0, halfsize);
splat2 = matchSplat(halfsize, size);
if (splat1 && splat2) {
  if (*splat1 == *splat2) {
    MatchInfo = Splat(*splat1);
    return true;
  } 
  MatchInfo = ConcatSplat(*splat1, *splat2);
  return true;
}
return false;  

We now have all the splat logic in one place; we might spot that one case is missing, which is the opportunity to implement half of this buildvector as a splat and the rest with the fall back case.

@abhinay-anubola abhinay-anubola force-pushed the sanubola.symmetric.build.vector branch from 36549ac to 1bcbfcd Compare April 3, 2025 06:39
@@ -1449,7 +1445,8 @@ bool llvm::matchSymmetricBuildVector(MachineInstr &MI, MachineRegisterInfo &MRI,
DstVecTy.changeElementCount(
DstVecTy.getElementCount().divideCoefficientBy(2)));
};
return true;

return (FirstHalfSrcReg.has_value() || SecondHalfSrcReg.has_value());
Copy link
Collaborator

@martien-de-jong martien-de-jong Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, but we can do more. We have now reused the matching code in the full splat vector match, but we still match twice: once to find two sub-splats and build two smaller BUILD_VECTORS and then once more to properly implement the splat vectors.
Instead, we can remove the original splat vector match and implement all splats here, because we have already done the full match. So if both subvectors are splats and identical, try to generate a full-width broadcast, otherwise if either of them are splats generate a concat with appropriate splat and buildvector.
Please note that creating a concat will hide the original full splat pattern, and we may be lucky that it is tried first, or we are painstakingly recognising the full width splat from the concat pattern.

Please leave this as a follow-up story with a TODO comment in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants