[LV] Always include middle block cost in isOutsideLoopWorkProfitable. #171102

fhahn · 2025-12-08T09:55:21Z

Always include the cost of the middle block in
isOutsideLoopWorkProfitable. This addresses the TODO from #168949 and removes the temporary restriction.

isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts.

In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly.

Always include the cost of the middle block in isOutsideLoopWorkProfitable. This addresses the TODO from llvm#168949 and removes the temporary restriction. isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts. In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly.

llvmbot · 2025-12-08T09:55:52Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Always include the cost of the middle block in
isOutsideLoopWorkProfitable. This addresses the TODO from #168949 and removes the temporary restriction.

isOutsideLoopWorkProfitable already scales the cost outside loops according the expected trip counts.

In practice this increases the minimum iteration threshold in a few cases. On a large IR corpus based on C/C++ workloads, ~50 out of 179450 vector loops have their thresholds increased slightly.

Full diff: https://github.com/llvm/llvm-project/pull/171102.diff

6 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-7)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+2-1)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 4edc004f161a1..c07663ad9670c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9329,13 +9329,7 @@ static bool isOutsideLoopWorkProfitable(GeneratedRTChecks &Checks,
   // one exists.
   TotalCost += calculateEarlyExitCost(CostCtx, Plan, VF.Width);
 
-  // If the expected trip count is less than the VF, the vector loop will only
-  // execute a single iteration. Then the middle block is executed the same
-  // number of times as the vector region.
-  // TODO: Extend logic to always account for the cost of the middle block.
-  auto ExpectedTC = getSmallBestKnownTC(PSE, L);
-  if (ExpectedTC && ElementCount::isKnownLE(*ExpectedTC, VF.Width))
-    TotalCost += Plan.getMiddleBlock()->cost(VF.Width, CostCtx);
+  TotalCost += Plan.getMiddleBlock()->cost(VF.Width, CostCtx);
 
   // When interleaving only scalar and vector cost will be equal, which in turn
   // would lead to a divide by 0. Fall back to hard threshold.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 519a104b9484f..502b312f0b7d9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1104,6 +1104,10 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
     return Ctx.TTI.getIntrinsicInstrCost(Attrs, Ctx.CostKind);
   }
   case VPInstruction::ExtractLastLane: {
+    // TODO: ExtractLastLane for scalar VF is a no-op. Remove before ::execute.
+    if (VF.isScalar())
+      return 0;
+
     // Add on the cost of extracting the element.
     auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
     return Ctx.TTI.getIndexedVectorInstrCostFromEnd(Instruction::ExtractElement,
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll
index 7ae50a5e4a075..de5870e269b67 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/early_exit_costs.ll
@@ -96,7 +96,7 @@ define i64 @vectorization_not_profitable_due_to_trunc(ptr dereferenceable(800) %
 ; CHECK-NEXT: Calculating cost of work in exit block vector.early.exit:
 ; CHECK-NEXT: Cost of 1 for VF 1: EMIT vp<%first.active.lane> = first-active-lane ir<%t>
 ; CHECK-NEXT: Cost of 0 for VF 1: EMIT vp<%early.exit.value> = extract-lane vp<%first.active.lane>, ir<%l>
-; CHECK-NEXT: LV: Vectorization is possible but not beneficial.
+; CHECK: LV: Vectorization is possible but not beneficial.
 entry:
   br label %loop.header
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
index 7b42e565e127d..40db6a53b49e4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll
@@ -94,7 +94,7 @@ define i64 @pointer_induction_only(ptr %start, ptr %end) {
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 [[END1]], [[START2]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = lshr i64 [[TMP0]], 2
 ; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
index 611b980999bfe..df1c639911cb0 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll
@@ -8,7 +8,7 @@ define void @no_outer_loop(ptr nocapture noundef %a, ptr nocapture noundef reado
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK-NOT:  We expect runtime memory checks to be hoisted out of the outer loop.
 ; CHECK:      Total cost of runtime checks: 4
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %inner.loop
 
@@ -34,7 +34,7 @@ define void @outer_no_tc(ptr nocapture noundef %a, ptr nocapture noundef readonl
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 3
 ; CHECK:      Total cost of runtime checks: 3
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %outer.loop
 
@@ -71,7 +71,7 @@ define void @outer_known_tc3(ptr nocapture noundef %a, ptr nocapture noundef rea
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 2
 ; CHECK:      Total cost of runtime checks: 2
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %outer.loop
 
@@ -108,7 +108,7 @@ define void @outer_known_tc64(ptr nocapture noundef %a, ptr nocapture noundef re
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 1
 ; CHECK:      Total cost of runtime checks: 1
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %outer.loop
 
@@ -145,7 +145,7 @@ define void @outer_pgo_3(ptr nocapture noundef %a, ptr nocapture noundef readonl
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 2
 ; CHECK:      Total cost of runtime checks: 2
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %outer.loop
 
@@ -182,7 +182,7 @@ define void @outer_pgo_minus1(ptr nocapture noundef %a, ptr nocapture noundef re
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 1
 ; CHECK:      Total cost of runtime checks: 1
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:16
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:16
 entry:
   br label %outer.loop
 
@@ -219,7 +219,7 @@ define void @outer_known_tc3_full_range_checks(ptr nocapture noundef %dst, ptr n
 ; CHECK:      Calculating cost of runtime checks:
 ; CHECK:      We expect runtime memory checks to be hoisted out of the outer loop. Cost reduced from 6 to 2
 ; CHECK:      Total cost of runtime checks: 2
-; CHECK-NEXT: LV: Minimum required TC for runtime checks to be profitable:4
+; CHECK:      LV: Minimum required TC for runtime checks to be profitable:4
 entry:
   br label %outer.loop
 
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
index e338b828d2520..dd6f0fe5f1292 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
@@ -16,7 +16,8 @@ define void @test_no_scalarization(ptr %a, ptr noalias %b, i32 %idx, i32 %n) #0
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[SMAX]], [[IDX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
 ; CHECK-NEXT:    [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 1
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], [[TMP3]]
+; CHECK-NEXT:    [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[TMP3]], i32 6)
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], [[UMAX]]
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP4:%.*]] = call i32 @llvm.vscale.i32()

lukel97 · 2025-12-08T10:24:51Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

  case VPInstruction::ExtractLastLane: {
+    // TODO: ExtractLastLane for scalar VF is a no-op. Remove before ::execute.
+    if (VF.isScalar())
+      return 0;


Is this needed as a part of this PR?

I imagine this instruction appears in the middle block for some outside uses of loop variables and @fhahn is trying to avoid regressions in cases where VF=1, IC>1.

@fhahn Is it worth committing this change separately as I think it makes sense on it's own? That way if this PR needs reverting due to some post-commit regression at least we don't have to revert this bit.

fhahn requested review from aniragil, ayalz, david-arm, lukel97 and rengolin December 8, 2025 09:55

llvmbot added vectorizers llvm:transforms labels Dec 8, 2025

fhahn mentioned this pull request Dec 8, 2025

[LV] Count cost of middle block if TC <= VF. #168949

Merged

lukel97 reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Always include middle block cost in isOutsideLoopWorkProfitable. #171102

[LV] Always include middle block cost in isOutsideLoopWorkProfitable. #171102

fhahn commented Dec 8, 2025

Uh oh!

llvmbot commented Dec 8, 2025 •

edited

Loading

Uh oh!

lukel97 Dec 8, 2025

Uh oh!

david-arm Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[LV] Always include middle block cost in isOutsideLoopWorkProfitable. #171102

Are you sure you want to change the base?

[LV] Always include middle block cost in isOutsideLoopWorkProfitable. #171102

Conversation

fhahn commented Dec 8, 2025

Uh oh!

llvmbot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Dec 8, 2025 •

edited

Loading