-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[SelectionDAG] Fix condition used for unsigned subtraction overflow #170896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-transforms Author: None (aabhinavg1) ChangesTransform Patch is 21.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170896.diff 7 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 743c4f574e131..3bd7eb855b147 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -865,6 +865,19 @@ InstCombinerImpl::foldIntrinsicWithOverflowCommon(IntrinsicInst *II) {
WO->getRHS(), *WO, OperationResult, OverflowResult))
return createOverflowTuple(WO, OperationResult, OverflowResult);
+ // Transform: usub.with.overflow(X, Y) -> {X - Y, X u< Y}
+ if (WO->getBinaryOp() == Instruction::Sub && !WO->isSigned()) {
+ IRBuilder<> Builder(WO);
+ Value *Sub = Builder.CreateSub(WO->getLHS(), WO->getRHS());
+ Value *Overflow = Builder.CreateICmpULT(WO->getLHS(), WO->getRHS());
+
+ Value *ResultStruct = UndefValue::get(WO->getType());
+ ResultStruct = Builder.CreateInsertValue(ResultStruct, Sub, 0);
+ ResultStruct = Builder.CreateInsertValue(ResultStruct, Overflow, 1);
+
+ return replaceInstUsesWith(*WO, ResultStruct);
+ }
+
// See whether we can optimize the overflow check with assumption information.
for (User *U : WO->users()) {
if (!match(U, m_ExtractValue<1>(m_Value())))
diff --git a/llvm/test/Transforms/InstCombine/known-bits.ll b/llvm/test/Transforms/InstCombine/known-bits.ll
index da2123a5dfe74..fc73ce5503ffe 100644
--- a/llvm/test/Transforms/InstCombine/known-bits.ll
+++ b/llvm/test/Transforms/InstCombine/known-bits.ll
@@ -1068,12 +1068,12 @@ define i1 @extract_value_usub(i8 %x, i8 %zz) {
; CHECK-LABEL: @extract_value_usub(
; CHECK-NEXT: [[Z:%.*]] = add nuw i8 [[ZZ:%.*]], 1
; CHECK-NEXT: [[Y:%.*]] = add i8 [[X:%.*]], [[Z]]
-; CHECK-NEXT: [[SUB_UOV:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[X]], i8 [[Y]])
-; CHECK-NEXT: [[SUB:%.*]] = extractvalue { i8, i1 } [[SUB_UOV]], 0
-; CHECK-NEXT: [[UOV:%.*]] = extractvalue { i8, i1 } [[SUB_UOV]], 1
+; CHECK-NEXT: [[SUB:%.*]] = xor i8 [[ZZ]], -1
+; CHECK-NEXT: [[UOV:%.*]] = icmp ult i8 [[X]], [[Y]]
; CHECK-NEXT: call void @use.i1(i1 [[UOV]])
; CHECK-NEXT: call void @use.i8(i8 [[SUB]])
-; CHECK-NEXT: ret i1 false
+; CHECK-NEXT: [[R:%.*]] = icmp eq i8 [[ZZ]], -1
+; CHECK-NEXT: ret i1 [[R]]
;
%z = add nuw i8 %zz, 1
%y = add i8 %x, %z
@@ -1090,12 +1090,11 @@ define i1 @extract_value_usub(i8 %x, i8 %zz) {
define i1 @extract_value_usub_fail(i8 %x, i8 %z) {
; CHECK-LABEL: @extract_value_usub_fail(
; CHECK-NEXT: [[Y:%.*]] = add i8 [[X:%.*]], [[Z:%.*]]
-; CHECK-NEXT: [[SUB_UOV:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[X]], i8 [[Y]])
-; CHECK-NEXT: [[SUB:%.*]] = extractvalue { i8, i1 } [[SUB_UOV]], 0
-; CHECK-NEXT: [[UOV:%.*]] = extractvalue { i8, i1 } [[SUB_UOV]], 1
+; CHECK-NEXT: [[SUB:%.*]] = sub i8 0, [[Z]]
+; CHECK-NEXT: [[UOV:%.*]] = icmp ult i8 [[X]], [[Y]]
; CHECK-NEXT: call void @use.i1(i1 [[UOV]])
; CHECK-NEXT: call void @use.i8(i8 [[SUB]])
-; CHECK-NEXT: [[R:%.*]] = icmp eq i8 [[SUB]], 0
+; CHECK-NEXT: [[R:%.*]] = icmp eq i8 [[Z]], 0
; CHECK-NEXT: ret i1 [[R]]
;
%y = add i8 %x, %z
diff --git a/llvm/test/Transforms/InstCombine/pr170634.ll b/llvm/test/Transforms/InstCombine/pr170634.ll
new file mode 100644
index 0000000000000..62a332e14b04a
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/pr170634.ll
@@ -0,0 +1,33 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=instcombine -S < %s | FileCheck %s
+define dso_local i64 @func(i64 noundef %x, i64 noundef %y) local_unnamed_addr {
+; CHECK-LABEL: @func(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i64 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT: br i1 [[TMP0]], label [[IF_THEN:%.*]], label [[IF_END:%.*]]
+; CHECK: if.then:
+; CHECK-NEXT: br label [[RETURN:%.*]]
+; CHECK: if.end:
+; CHECK-NEXT: [[TMP1:%.*]] = sub nuw i64 [[X]], [[Y]]
+; CHECK-NEXT: br label [[RETURN]]
+; CHECK: return:
+; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i64 [ 291, [[IF_THEN]] ], [ [[TMP1]], [[IF_END]] ]
+; CHECK-NEXT: ret i64 [[RETVAL_0]]
+;
+entry:
+ %0 = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %x, i64 %y)
+ %1 = extractvalue { i64, i1 } %0, 1
+ %2 = extractvalue { i64, i1 } %0, 0
+ br i1 %1, label %if.then, label %if.end
+
+if.then: ; preds = %entry
+ br label %return
+
+if.end: ; preds = %entry
+ br label %return
+
+return: ; preds = %if.end, %if.then
+ %retval.0 = phi i64 [ 291, %if.then ], [ %2, %if.end ]
+ ret i64 %retval.0
+}
+
diff --git a/llvm/test/Transforms/InstCombine/result-of-usub-is-non-zero-and-no-overflow.ll b/llvm/test/Transforms/InstCombine/result-of-usub-is-non-zero-and-no-overflow.ll
index 30a5072c7edc8..46b8a853e6cf5 100644
--- a/llvm/test/Transforms/InstCombine/result-of-usub-is-non-zero-and-no-overflow.ll
+++ b/llvm/test/Transforms/InstCombine/result-of-usub-is-non-zero-and-no-overflow.ll
@@ -141,16 +141,16 @@ define i1 @t1_strict_logical(i8 %base, i8 %offset) {
define i1 @t2(i8 %base, i8 %offset) {
; CHECK-LABEL: @t2(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
; CHECK-NEXT: [[NO_UNDERFLOW:%.*]] = xor i1 [[UNDERFLOW]], true
; CHECK-NEXT: call void @use1(i1 [[NO_UNDERFLOW]])
-; CHECK-NEXT: [[NOT_NULL:%.*]] = icmp ne i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = and i1 [[NOT_NULL]], [[NO_UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
@@ -168,16 +168,16 @@ define i1 @t2(i8 %base, i8 %offset) {
define i1 @t2_logical(i8 %base, i8 %offset) {
; CHECK-LABEL: @t2_logical(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
; CHECK-NEXT: [[NO_UNDERFLOW:%.*]] = xor i1 [[UNDERFLOW]], true
; CHECK-NEXT: call void @use1(i1 [[NO_UNDERFLOW]])
-; CHECK-NEXT: [[NOT_NULL:%.*]] = icmp ne i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = and i1 [[NOT_NULL]], [[NO_UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
@@ -321,16 +321,16 @@ define i1 @t5_commutability2_logical(i8 %base, i8 %offset) {
define i1 @t6_commutability(i8 %base, i8 %offset) {
; CHECK-LABEL: @t6_commutability(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
; CHECK-NEXT: [[NO_UNDERFLOW:%.*]] = xor i1 [[UNDERFLOW]], true
; CHECK-NEXT: call void @use1(i1 [[NO_UNDERFLOW]])
-; CHECK-NEXT: [[NOT_NULL:%.*]] = icmp ne i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = and i1 [[NOT_NULL]], [[NO_UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
@@ -348,16 +348,16 @@ define i1 @t6_commutability(i8 %base, i8 %offset) {
define i1 @t6_commutability_logical(i8 %base, i8 %offset) {
; CHECK-LABEL: @t6_commutability_logical(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
; CHECK-NEXT: [[NO_UNDERFLOW:%.*]] = xor i1 [[UNDERFLOW]], true
; CHECK-NEXT: call void @use1(i1 [[NO_UNDERFLOW]])
-; CHECK-NEXT: [[NOT_NULL:%.*]] = icmp ne i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = and i1 [[NOT_NULL]], [[NO_UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
@@ -459,14 +459,14 @@ define i1 @t7_nonstrict_logical(i8 %base, i8 %offset) {
define i1 @t8(i8 %base, i8 %offset) {
; CHECK-LABEL: @t8(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
-; CHECK-NEXT: [[NULL:%.*]] = icmp eq i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = or i1 [[NULL]], [[UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ule i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
@@ -482,14 +482,14 @@ define i1 @t8(i8 %base, i8 %offset) {
define i1 @t8_logical(i8 %base, i8 %offset) {
; CHECK-LABEL: @t8_logical(
-; CHECK-NEXT: [[AGG:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[BASE:%.*]], i8 [[OFFSET:%.*]])
+; CHECK-NEXT: [[ADJUSTED:%.*]] = sub i8 [[BASE:%.*]], [[OFFSET:%.*]]
+; CHECK-NEXT: [[UNDERFLOW:%.*]] = icmp ult i8 [[BASE]], [[OFFSET]]
+; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i8, i1 } undef, i8 [[ADJUSTED]], 0
+; CHECK-NEXT: [[AGG:%.*]] = insertvalue { i8, i1 } [[TMP3]], i1 [[UNDERFLOW]], 1
; CHECK-NEXT: call void @useagg({ i8, i1 } [[AGG]])
-; CHECK-NEXT: [[ADJUSTED:%.*]] = extractvalue { i8, i1 } [[AGG]], 0
; CHECK-NEXT: call void @use8(i8 [[ADJUSTED]])
-; CHECK-NEXT: [[UNDERFLOW:%.*]] = extractvalue { i8, i1 } [[AGG]], 1
; CHECK-NEXT: call void @use1(i1 [[UNDERFLOW]])
-; CHECK-NEXT: [[NULL:%.*]] = icmp eq i8 [[ADJUSTED]], 0
-; CHECK-NEXT: [[R:%.*]] = or i1 [[NULL]], [[UNDERFLOW]]
+; CHECK-NEXT: [[R:%.*]] = icmp ule i8 [[BASE]], [[OFFSET]]
; CHECK-NEXT: ret i1 [[R]]
;
%agg = call {i8, i1} @llvm.usub.with.overflow(i8 %base, i8 %offset)
diff --git a/llvm/test/Transforms/InstCombine/usub-overflow-known-by-implied-cond.ll b/llvm/test/Transforms/InstCombine/usub-overflow-known-by-implied-cond.ll
index 90ca39a70a0bb..c9030e5ab0321 100644
--- a/llvm/test/Transforms/InstCombine/usub-overflow-known-by-implied-cond.ll
+++ b/llvm/test/Transforms/InstCombine/usub-overflow-known-by-implied-cond.ll
@@ -175,11 +175,10 @@ define i32 @test7(i32 %a, i32 %b) {
; CHECK-NEXT: [[COND:%.*]] = icmp slt i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT: br i1 [[COND]], label [[BB1:%.*]], label [[BB3:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -205,11 +204,10 @@ define i32 @test8(i32 %a, i32 %b) {
; CHECK-NEXT: [[COND_NOT:%.*]] = icmp eq i32 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT: br i1 [[COND_NOT]], label [[BB3:%.*]], label [[BB1:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -296,11 +294,10 @@ define i32 @test10(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[AND:%.*]] = and i1 [[COND]], [[COND2:%.*]]
; CHECK-NEXT: br i1 [[AND]], label [[BB3:%.*]], label [[BB1:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -328,11 +325,10 @@ define i32 @test10_logical(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[AND:%.*]] = select i1 [[COND]], i1 [[COND2:%.*]], i1 false
; CHECK-NEXT: br i1 [[AND]], label [[BB3:%.*]], label [[BB1:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -360,11 +356,10 @@ define i32 @test11(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[OR:%.*]] = or i1 [[COND]], [[COND2:%.*]]
; CHECK-NEXT: br i1 [[OR]], label [[BB1:%.*]], label [[BB3:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -392,11 +387,10 @@ define i32 @test11_logical(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[OR:%.*]] = select i1 [[COND]], i1 true, i1 [[COND2:%.*]]
; CHECK-NEXT: br i1 [[OR]], label [[BB1:%.*]], label [[BB3:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -424,11 +418,10 @@ define i32 @test12(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[OR:%.*]] = or i1 [[COND]], [[COND2:%.*]]
; CHECK-NEXT: br i1 [[OR]], label [[BB3:%.*]], label [[BB1:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
@@ -456,11 +449,10 @@ define i32 @test12_logical(i32 %a, i32 %b, i1 %cond2) {
; CHECK-NEXT: [[OR:%.*]] = select i1 [[COND]], i1 true, i1 [[COND2:%.*]]
; CHECK-NEXT: br i1 [[OR]], label [[BB3:%.*]], label [[BB1:%.*]]
; CHECK: bb1:
-; CHECK-NEXT: [[SUB1:%.*]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[A]], i32 [[B]])
-; CHECK-NEXT: [[C1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 1
+; CHECK-NEXT: [[C1:%.*]] = icmp ult i32 [[A]], [[B]]
; CHECK-NEXT: br i1 [[C1]], label [[BB3]], label [[BB2:%.*]]
; CHECK: bb2:
-; CHECK-NEXT: [[R1:%.*]] = extractvalue { i32, i1 } [[SUB1]], 0
+; CHECK-NEXT: [[R1:%.*]] = sub nuw i32 [[A]], [[B]]
; CHECK-NEXT: ret i32 [[R1]]
; CHECK: bb3:
; CHECK-NEXT: ret i32 0
diff --git a/llvm/test/Transforms/InstCombine/usubo.ll b/llvm/test/Transforms/InstCombine/usubo.ll
index 2074190a2cd45..e4b9c0e08ba22 100644
--- a/llvm/test/Transforms/InstCombine/usubo.ll
+++ b/llvm/test/Transforms/InstCombine/usubo.ll
@@ -130,10 +130,9 @@ define i1 @sub_ne0(i8 %x, i8 %y, i1 %b) {
define i1 @sub_eq1(i8 %x, i8 %y, i1 %b) {
; CHECK-LABEL: @sub_eq1(
-; CHECK-NEXT: [[SS:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[X:%.*]], i8 [[Y:%.*]])
-; CHECK-NEXT: [[OV:%.*]] = extractvalue { i8, i1 } [[SS]], 1
+; CHECK-NEXT: [[SUB:%.*]] = sub i8 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT: [[OV:%.*]] = icmp ult i8 [[X]], [[Y]]
; CHECK-NEXT: call void @use(i1 [[OV]])
-; CHECK-NEXT: [[SUB:%.*]] = extractvalue { i8, i1 } [[SS]], 0
; CHECK-NEXT: [[EQ1:%.*]] = icmp eq i8 [[SUB]], 1
; CHECK-NEXT: ret i1 [[EQ1]]
;
@@ -149,10 +148,9 @@ define i1 @sub_eq1(i8 %x, i8 %y, i1 %b) {
define i1 @sub_sgt0(i8 %x, i8 %y, i1 %b) {
; CHECK-LABEL: @sub_sgt0(
-; CHECK-NEXT: [[SS:%.*]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[X:%.*]], i8 [[Y:%.*]])
-; CHECK-NEXT: [[OV:%.*]] = extractvalue { i8, i1 } [[SS]], 1
+; CHECK-NEXT: [[SUB:%.*]] = sub i8 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT: [[OV:%.*]] = icmp ult i8 [[X]], [[Y]]
; CHECK-NEXT: call void @use(i1 [[OV]])
-; CHECK-NEXT: [[SUB:%.*]] = extractvalue { i8, i1 } [[SS]], 0
; CHECK-NEXT: [[SGT0:%.*]] = icmp sgt i8 [[SUB]], 0
; CHECK-NEXT: ret i1 [[SGT0]]
;
diff --git a/llvm/test/Transforms/InstC...
[truncated]
|
|
✅ With the latest revision this PR passed the undef deprecator. |
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.CodeGen/AArch64/active_lane_mask.llLLVM.CodeGen/AArch64/vec_uaddo.llLLVM.CodeGen/AMDGPU/carryout-selection.llLLVM.CodeGen/AMDGPU/uaddo.llLLVM.CodeGen/ARM/addsubo-legalization.llLLVM.CodeGen/PowerPC/sat-add.llLLVM.CodeGen/RISCV/addcarry.llLLVM.CodeGen/RISCV/arith-with-overflow.llLLVM.CodeGen/RISCV/overflow-intrinsics.llLLVM.CodeGen/RISCV/uadd_sat.llLLVM.CodeGen/RISCV/uadd_sat_plus.llLLVM.CodeGen/RISCV/umulo-128-legalisation-lowering.llLLVM.CodeGen/RISCV/xaluo.llLLVM.CodeGen/RISCV/xqcia.llLLVM.CodeGen/SPARC/umulo-128-legalisation-lowering.llLLVM.CodeGen/Thumb2/mve-saturating-arith.llLLVM.CodeGen/WebAssembly/umulo-128-legalisation-lowering.llLLVM.CodeGen/X86/expand-vp-int-intrinsics.llLLVM.CodeGen/X86/sat-add.llLLVM.CodeGen/X86/uadd_sat.llLLVM.CodeGen/X86/uadd_sat_vec.llLLVM.CodeGen/X86/vec_uaddo.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do consider usub.with.overflow non-canonical, so in principle this is fine (if we're willing to overlook the multi-use undef). I'm a bit concerned that we may not always recover this in the backend, which is more problematic when we originally started from an overflow builtin.
| } | ||
|
|
||
| Instruction * | ||
| InstCombinerImpl::foldIntrinsicWithOverflowCommon(IntrinsicInst *II) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code should not be inside foldIntrinsicWithOverflowCommon(), as this is not actually a common transform, it applies to only a single intrinsic.
I understand your concern about the backend possibly not recovering this transformation when starting from an overflow builtin. Do you have a suggestion for how we could handle this safely, or a better place to implement this transform so it only affects usub.with.overflow without introducing issues in the backend? |
|
We try to recover this pattern into usubo in CodeGenPrepare. So usubo should be canonical in the middle-end. llvm-project/llvm/lib/CodeGen/CodeGenPrepare.cpp Lines 1732 to 1736 in 6bb7863
The root cause is the inefficient lowering of usubo. Currently we lower the overflow bit of usubo into llvm-project/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Lines 11467 to 11471 in ddd770d
If it causes some regressions after we change it into |
| @@ -0,0 +1,34 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused test. Please also add a RISC-V codegen test demonstrating the issue.
|
Please also update the PR title/description. |
| ; AVX-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm2 | ||
| ; AVX-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0 | ||
| ; AVX-NEXT: vmovaps %xmm1, %xmm0 | ||
| ; AVX-NEXT: retq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two instructions (vpcmpeqd + vblendvps) were doing a compare and conditional blend of vectors, while now it’s replaced by vmovaps, which just moves the vector directly. So effectively, the result is the same, but the code is simpler and shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test that in alive please? I'm not convinced that the IR for uaddo(not(x),1) should simplify to always overflow
| ; CHECK-NEXT: flat_store_dwordx2 v[4:5], v[0:1] | ||
| ; CHECK-NEXT: v_add_co_u32_e32 v2, vcc, -1, v0 | ||
| ; CHECK-NEXT: v_addc_co_u32_e32 v3, vcc, -1, v1, vcc | ||
| ; CHECK-NEXT: v_cmp_lt_u64_e32 vcc, 1, v[0:1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the sub → add substitution is functionally identical, the additional v_cmp_lt_u64_e32 instruction should not be emitted, see #155255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out
I’ll investigate why this extra v_cmp_lt_u64_e32 emitted for the sub → add substitution and see whether we can avoid it (or add a backend fix), and report back
Co-authored-by: Jay Foad <[email protected]>
When handling
usub.with.overflow, we were checkingLHS > RHS(SETUGT), which is the opposite of the actual borrow condition
(
LHS < RHS). This could produce an incorrect overflow flag.This patch picks CompareLHS/CompareRHS based on IsAdd so that we now:
• use (Result < LHS) for uadd
• use (LHS < RHS) for usub
So the logic now matches unsigned overflow rules for both cases.
fix #170634