Skip to content

[RISCV] Rewrite vrgather.vx undef, (vmv.s.x), 0, v0 as vmv.v.x #136010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025

Conversation

preames
Copy link
Collaborator

@preames preames commented Apr 16, 2025

This extends the DAG combine introduced in 336b290 to handle the case where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If the vrgather splats the single source element, and has no passthru we can replace it with a vmv.v.x - which will in turn usually get folded into a vmerge if a select follows.

This extends the DAG combine introduced in 336b290 to handle the case
where the prior value is defined by a vmv.s.x instead of a vmv.v.x.
If the vrgather splats the single source element, and has no passthru
we can replace it with a vmv.v.x - which will in turn usually get
folded into a vmerge if a select follows.
@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This extends the DAG combine introduced in 336b290 to handle the case where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If the vrgather splats the single source element, and has no passthru we can replace it with a vmv.v.x - which will in turn usually get folded into a vmerge if a select follows.


Full diff: https://github.com/llvm/llvm-project/pull/136010.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+29-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-fp.ll (+6-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll (+6-11)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f24752b8721f5..4dd237b5415dd 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -5573,7 +5573,6 @@ static SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG,
     const int Lane = SVN->getSplatIndex();
     if (Lane >= 0) {
       MVT SVT = VT.getVectorElementType();
-
       // Turn splatted vector load into a strided load with an X0 stride.
       SDValue V = V1;
       // Peek through CONCAT_VECTORS as VectorCombine can concat a vector
@@ -19710,20 +19709,47 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
       return V;
     break;
   case RISCVISD::VRGATHER_VX_VL: {
-    // Drop a redundant vrgather_vx.
+    using namespace llvm::SDPatternMatch;
     // Note this assumes that out of bounds indices produce poison
     // and can thus be replaced without having to prove them inbounds..
+    EVT VT = N->getValueType(0);
     SDValue Src = N->getOperand(0);
+    SDValue Idx = N->getOperand(1);
     SDValue Passthru = N->getOperand(2);
     SDValue VL = N->getOperand(4);
+
+    // Warning: Unlike most cases we strip an insert_subvector, this one
+    // does not require the first operand to be undef.
+    if (Src.getOpcode() == ISD::INSERT_SUBVECTOR &&
+        sd_match(Src.getOperand(2), m_Zero()))
+      Src = Src.getOperand(1);
+
     switch (Src.getOpcode()) {
     default:
       break;
     case RISCVISD::VMV_V_X_VL:
     case RISCVISD::VFMV_V_F_VL:
-      if (Passthru.isUndef() && VL == Src.getOperand(2))
+      // Drop a redundant vrgather_vx.
+      // TODO: Remove the type restriction if we find a motivating
+      // test case?
+      if (Passthru.isUndef() && VL == Src.getOperand(2) &&
+          Src.getValueType() == VT)
         return Src;
       break;
+    case RISCVISD::VMV_S_X_VL:
+    case RISCVISD::VFMV_S_F_VL:
+      // If this use only demands lane zero from the source vmv.s.x, and
+      // doesn't have a passthru, then this vrgather.vi/vx is equivalent to
+      // a vmv.v.x.  Note that there can be other uses of the original
+      // vmv.s.x and thus we can't eliminate it.  (vfmv.s.f is analogous)
+      if (sd_match(Idx, m_Zero()) && Passthru.isUndef() &&
+          VL == Src.getOperand(2)) {
+        unsigned Opc =
+            VT.isFloatingPoint() ? RISCVISD::VFMV_V_F_VL : RISCVISD::VMV_V_X_VL;
+        return DAG.getNode(Opc, DL, VT, DAG.getUNDEF(VT), Src.getOperand(1),
+                           VL);
+      }
+      break;
     }
     break;
   }
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-fp.ll
index 5aac2687122ae..f580b1b993395 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-fp.ll
@@ -96,13 +96,11 @@ define <8 x float> @vmerge_vxm(<8 x float> %v, float %s) {
 ; CHECK-LABEL: vmerge_vxm:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
-; CHECK-NEXT:    vfmv.s.f v8, fa0
+; CHECK-NEXT:    vsetivli zero, 1, e32, m4, tu, ma
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vmv2r.v v10, v8
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vrgather.vi v10, v8, 0, v0.t
-; CHECK-NEXT:    vmv.v.v v8, v10
+; CHECK-NEXT:    vfmv.s.f v8, fa0
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vfmerge.vfm v8, v8, fa0, v0
 ; CHECK-NEXT:    ret
   %ins = insertelement <8 x float> %v, float %s, i32 0
   %shuf = shufflevector <8 x float> %ins, <8 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 0, i32 5, i32 6, i32 7>
@@ -112,15 +110,10 @@ define <8 x float> @vmerge_vxm(<8 x float> %v, float %s) {
 define <8 x float> @vmerge_vxm2(<8 x float> %v, float %s) {
 ; CHECK-LABEL: vmerge_vxm2:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 1, e32, m4, tu, ma
-; CHECK-NEXT:    vmv1r.v v12, v8
-; CHECK-NEXT:    vmv2r.v v10, v8
 ; CHECK-NEXT:    li a0, 25
-; CHECK-NEXT:    vfmv.s.f v12, fa0
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vmv.s.x v0, a0
-; CHECK-NEXT:    vmv1r.v v10, v12
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, mu
-; CHECK-NEXT:    vrgather.vi v8, v10, 0, v0.t
+; CHECK-NEXT:    vfmerge.vfm v8, v8, fa0, v0
 ; CHECK-NEXT:    ret
   %ins = insertelement <8 x float> %v, float %s, i32 0
   %shuf = shufflevector <8 x float> %v, <8 x float> %ins, <8 x i32> <i32 8, i32 1, i32 2, i32 8, i32 8, i32 5, i32 6, i32 7>
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
index 5c4ef29d7d5b7..8676803e20e3b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-int.ll
@@ -1448,13 +1448,11 @@ define <8 x i8> @vmerge_vxm(<8 x i8> %v, i8 %s) {
 ; CHECK-LABEL: vmerge_vxm:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetivli zero, 8, e8, m1, tu, ma
-; CHECK-NEXT:    vmv.s.x v8, a0
+; CHECK-NEXT:    vsetivli zero, 1, e8, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v0, a1
-; CHECK-NEXT:    vmv1r.v v9, v8
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, mu
-; CHECK-NEXT:    vrgather.vi v9, v8, 0, v0.t
-; CHECK-NEXT:    vmv1r.v v8, v9
+; CHECK-NEXT:    vmv.s.x v8, a0
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT:    vmerge.vxm v8, v8, a0, v0
 ; CHECK-NEXT:    ret
   %ins = insertelement <8 x i8> %v, i8 %s, i32 0
   %shuf = shufflevector <8 x i8> %ins, <8 x i8> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 0, i32 0, i32 5, i32 6, i32 7>
@@ -1465,12 +1463,9 @@ define <8 x i8> @vmerge_vxm2(<8 x i8> %v, i8 %s) {
 ; CHECK-LABEL: vmerge_vxm2:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 25
-; CHECK-NEXT:    vsetivli zero, 1, e8, m1, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.s.x v0, a1
-; CHECK-NEXT:    vmv1r.v v9, v8
-; CHECK-NEXT:    vmv.s.x v9, a0
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, mu
-; CHECK-NEXT:    vrgather.vi v8, v9, 0, v0.t
+; CHECK-NEXT:    vmerge.vxm v8, v8, a0, v0
 ; CHECK-NEXT:    ret
   %ins = insertelement <8 x i8> %v, i8 %s, i32 0
   %shuf = shufflevector <8 x i8> %v, <8 x i8> %ins, <8 x i32> <i32 8, i32 1, i32 2, i32 8, i32 8, i32 5, i32 6, i32 7>

@@ -96,13 +96,11 @@ define <8 x float> @vmerge_vxm(<8 x float> %v, float %s) {
; CHECK-LABEL: vmerge_vxm:
; CHECK: # %bb.0:
; CHECK-NEXT: li a0, 25
; CHECK-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: vsetivli zero, 1, e32, m4, tu, ma
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of m4 here is very odd, but not really related to this change. This is happening in InsertVSETVLI because the LMUL isn't really demanded, but starts at m1, the original SEW of the vmv.s.x is e8, and we decide to adjust the input to preserve the SEW/LMUL ratio. I don't believe this actually matters, it just creates an odd looking diff.

; CHECK-NEXT: vsetivli zero, 8, e32, m2, ta, mu
; CHECK-NEXT: vrgather.vi v10, v8, 0, v0.t
; CHECK-NEXT: vmv.v.v v8, v10
; CHECK-NEXT: vfmv.s.f v8, fa0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this vfmv.s.f could be eliminated if we rewrote the mask on the vmerge. I don't plan to do this, just noting it's vaguely possible. I'm mildly of the opinion that this approach (the post lowering DAG) has been pushed as far as we should, and that if we want to further improve, we should instead starting canonicalizing shuffles before lowering. I may change my mind based on what future cases I stumble into. :)

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit 7866fc2 into llvm:main Apr 17, 2025
6 of 10 checks passed
@preames preames deleted the pr-riscv-vrgather_vx_vmv.s.x-combine branch April 17, 2025 17:06
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…136010)

This extends the DAG combine introduced in 336b290 to handle the case
where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If
the vrgather splats the single source element, and has no passthru we
can replace it with a vmv.v.x - which will in turn usually get folded
into a vmerge if a select follows.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…136010)

This extends the DAG combine introduced in 336b290 to handle the case
where the prior value is defined by a vmv.s.x instead of a vmv.v.x. If
the vrgather splats the single source element, and has no passthru we
can replace it with a vmv.v.x - which will in turn usually get folded
into a vmerge if a select follows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants