Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIEX] Shift G_CONCAT_VECTORS closer to the user #234

Merged
merged 1 commit into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion llvm/lib/Target/AIE/AIECombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1367,10 +1367,25 @@ bool llvm::matchUpdToConcat(MachineInstr &MI, MachineRegisterInfo &MRI,
return true;
}

/// Find a use of \p MI in the same block where it can be moved
MachineInstr &findClosestToUseInsertPoint(MachineInstr &MI,
MachineRegisterInfo &MRI) {

for (auto &User : MRI.use_instructions(MI.getOperand(0).getReg())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe add a top-level comment like Find a use of \p MI in the same block where it can be moved

if (User.isPHI())
continue;
if (User.getParent() == MI.getParent() && canDelayMemOp(MI, User, MRI))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super-nit: I think we should really rename canDelayMemOp into just canDelayOp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, because it sounds strange to use it for non-memory instructions. On the other hand, this function check specific side effects of crossing memory operations, differently from canAdvanceOp where the instruction of interest is assumed to not be a load/store. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it like this, it might just be a bit too conservative when encountering a store. We will need to revamp it as to point anyway if we want the combiners to move past certain intrinsics with side effects.

return User;
}

return MI;
}

void llvm::applyUpdToConcat(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B,
std::map<unsigned, Register> &IndexRegMap) {
B.setInstrAndDebugLoc(MI);
B.setDebugLoc(MI.getDebugLoc());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't the debug location and insertion point for MachineInstr the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @F-Stuckmann , in this case, the goal is to retain the same debug information (related to the instruction that we are replacing), but building the instruction in a different (shifted) position.

B.setInstr(findClosestToUseInsertPoint(MI, MRI));

SmallVector<Register, 4> SrcRegs;
for (unsigned Op = 0; Op < IndexRegMap.size(); Op++) {
Expand Down
33 changes: 33 additions & 0 deletions llvm/test/CodeGen/AIE/GlobalISel/combine-upd-concat.mir
Original file line number Diff line number Diff line change
Expand Up @@ -178,3 +178,36 @@ body: |
$wh0 = COPY %254:_(<8 x s32>)
$wh1 = COPY %255:_(<8 x s32>)
...

---
name: upd_I512.I256_shift_insert_point
body: |
bb.0:
liveins: $p0, $wl2, $wl3
; CHECK-LABEL: name: upd_I512.I256_shift_insert_point
; CHECK: liveins: $p0, $wl2, $wl3
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<32 x s8>) = COPY $wl2
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<32 x s8>) = COPY $wl3
; CHECK-NEXT: [[INT:%[0-9]+]]:_(<64 x s8>) = G_INTRINSIC intrinsic(@llvm.aie2.v64int8)
; CHECK-NEXT: [[BITCAST:%[0-9]+]]:_(<16 x s32>) = G_BITCAST [[INT]](<64 x s8>)
; CHECK-NEXT: [[BITCAST1:%[0-9]+]]:_(<8 x s32>) = G_BITCAST [[COPY]](<32 x s8>)
; CHECK-NEXT: [[BITCAST2:%[0-9]+]]:_(<8 x s32>) = G_BITCAST [[COPY1]](<32 x s8>)
; CHECK-NEXT: $x1 = COPY [[BITCAST]](<16 x s32>)
; CHECK-NEXT: $x2 = COPY [[BITCAST]](<16 x s32>)
; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s32>) = G_CONCAT_VECTORS [[BITCAST1]](<8 x s32>), [[BITCAST2]](<8 x s32>)
; CHECK-NEXT: $x0 = COPY [[CONCAT_VECTORS]](<16 x s32>)
%95:_(<32 x s8>) = COPY $wl2
%98:_(<32 x s8>) = COPY $wl3
%8:_(<64 x s8>) = G_INTRINSIC intrinsic(@llvm.aie2.v64int8)
%9:_(<16 x s32>) = G_BITCAST %8(<64 x s8>)
%21:_(s32) = G_CONSTANT i32 0
%51:_(s32) = G_CONSTANT i32 1
%96:_(<8 x s32>) = G_BITCAST %95(<32 x s8>)
%97:_(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.aie2.upd.I512.I256), %9(<16 x s32>), %96(<8 x s32>), %21(s32)
%99:_(<8 x s32>) = G_BITCAST %98(<32 x s8>)
%100:_(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.aie2.upd.I512.I256), %97(<16 x s32>), %99(<8 x s32>), %51(s32)
$x1 = COPY %9:_(<16 x s32>)
$x2 = COPY %9:_(<16 x s32>)
$x0 = COPY %100:_(<16 x s32>)
...