[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s by zeux · Pull Request #149609 · llvm/llvm-project

zeux · 2025-07-18T22:55:00Z

convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines.

Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit.

Fixes #149457.

llvmbot · 2025-07-18T22:55:31Z

@llvm/pr-subscribers-backend-webassembly

Author: Arseny Kapoulkine (zeux)

Changes

convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines.

Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit.

Fixes #149457.

Full diff: https://github.com/llvm/llvm-project/pull/149609.diff

3 Files Affected:

(modified) llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp (+22)
(modified) llvm/test/CodeGen/WebAssembly/simd-conversions.ll (+28)
(modified) llvm/test/CodeGen/WebAssembly/simd-extending-convert.ll (+5-5)

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index bf2e04caa0a61..aa68f6f647918 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -2934,6 +2934,25 @@ performVectorExtendToFPCombine(SDNode *N,
   return DAG.getNode(N->getOpcode(), SDLoc(N), ResVT, Conv);
 }
 
+static SDValue
+performVectorNonNegToFPCombine(SDNode *N,
+                               TargetLowering::DAGCombinerInfo &DCI) {
+  auto &DAG = DCI.DAG;
+
+  SDNodeFlags Flags = N->getFlags();
+  SDValue Op0 = N->getOperand(0);
+  EVT VT = N->getValueType(0);
+
+  // Optimize uitofp to sitofp when the sign bit is known to be zero.
+  // Depending on the target (runtime) backend, this might be performance
+  // neutral (e.g. AArch64) or a significant improvement (e.g. x86_64).
+  if (Flags.hasNonNeg() || DAG.SignBitIsZero(Op0)) {
+    return DAG.getNode(ISD::SINT_TO_FP, SDLoc(N), VT, Op0);
+  }
+
+  return SDValue();
+}
+
 static SDValue
 performVectorExtendCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
   auto &DAG = DCI.DAG;
@@ -3515,6 +3534,9 @@ WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,
   case ISD::ZERO_EXTEND:
     return performVectorExtendCombine(N, DCI);
   case ISD::UINT_TO_FP:
+    if (auto ExtCombine = performVectorExtendToFPCombine(N, DCI))
+      return ExtCombine;
+    return performVectorNonNegToFPCombine(N, DCI);
   case ISD::SINT_TO_FP:
     return performVectorExtendToFPCombine(N, DCI);
   case ISD::FP_TO_SINT_SAT:
diff --git a/llvm/test/CodeGen/WebAssembly/simd-conversions.ll b/llvm/test/CodeGen/WebAssembly/simd-conversions.ll
index 8459ec8101ff2..b355a0d60317b 100644
--- a/llvm/test/CodeGen/WebAssembly/simd-conversions.ll
+++ b/llvm/test/CodeGen/WebAssembly/simd-conversions.ll
@@ -441,3 +441,31 @@ define <2 x double> @promote_mixed_v2f64(<4 x float> %x, <4 x float> %y) {
   %a = fpext <2 x float> %v to <2 x double>
   ret <2 x double> %a
 }
+
+define <4 x float> @convert_u_v4f32_maybeneg(<4 x i32> %x) {
+; CHECK-LABEL: convert_u_v4f32_maybeneg:
+; CHECK:         .functype convert_u_v4f32_maybeneg (v128) -> (v128)
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    local.get 0
+; CHECK-NEXT:    i32.const 1
+; CHECK-NEXT:    i32x4.shr_s
+; CHECK-NEXT:    f32x4.convert_i32x4_u
+; CHECK-NEXT:    # fallthrough-return
+  %a = ashr <4 x i32> %x, <i32 1, i32 1, i32 1, i32 1>
+  %b = uitofp <4 x i32> %a to <4 x float>
+  ret <4 x float> %b
+}
+
+define <4 x float> @convert_u_v4f32_nonneg(<4 x i32> %x) {
+; CHECK-LABEL: convert_u_v4f32_nonneg:
+; CHECK:         .functype convert_u_v4f32_nonneg (v128) -> (v128)
+; CHECK-NEXT:  # %bb.0:
+; CHECK-NEXT:    local.get 0
+; CHECK-NEXT:    i32.const 1
+; CHECK-NEXT:    i32x4.shr_u
+; CHECK-NEXT:    f32x4.convert_i32x4_s
+; CHECK-NEXT:    # fallthrough-return
+  %a = lshr <4 x i32> %x, <i32 1, i32 1, i32 1, i32 1>
+  %b = uitofp <4 x i32> %a to <4 x float>
+  ret <4 x float> %b
+}
diff --git a/llvm/test/CodeGen/WebAssembly/simd-extending-convert.ll b/llvm/test/CodeGen/WebAssembly/simd-extending-convert.ll
index c93b8aa7fb42e..eb39f90e68701 100644
--- a/llvm/test/CodeGen/WebAssembly/simd-extending-convert.ll
+++ b/llvm/test/CodeGen/WebAssembly/simd-extending-convert.ll
@@ -12,7 +12,7 @@ define <4 x float> @extend_to_float_low_i16x8_u(<8 x i16> %x) {
 ; CHECK-NEXT:  # %bb.0:
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    i32x4.extend_low_i16x8_u
-; CHECK-NEXT:    f32x4.convert_i32x4_u
+; CHECK-NEXT:    f32x4.convert_i32x4_s
 ; CHECK-NEXT:    # fallthrough-return
   %low = shufflevector <8 x i16> %x, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   %extended = uitofp <4 x i16> %low to <4 x float>
@@ -25,7 +25,7 @@ define <4 x float> @extend_to_float_high_i16x8_u(<8 x i16> %x) {
 ; CHECK-NEXT:  # %bb.0:
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    i32x4.extend_high_i16x8_u
-; CHECK-NEXT:    f32x4.convert_i32x4_u
+; CHECK-NEXT:    f32x4.convert_i32x4_s
 ; CHECK-NEXT:    # fallthrough-return
   %high = shufflevector <8 x i16> %x, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
   %extended = uitofp <4 x i16> %high to <4 x float>
@@ -39,7 +39,7 @@ define <4 x float> @extend_to_float_low_i8x16_u(<8 x i8> %x) {
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    i16x8.extend_low_i8x16_u
 ; CHECK-NEXT:    i32x4.extend_low_i16x8_u
-; CHECK-NEXT:    f32x4.convert_i32x4_u
+; CHECK-NEXT:    f32x4.convert_i32x4_s
 ; CHECK-NEXT:    # fallthrough-return
   %low = shufflevector <8 x i8> %x, <8 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
   %extended = uitofp <4 x i8> %low to <4 x float>
@@ -55,7 +55,7 @@ define <4 x float> @extend_to_float_high_i8x16_u(<8 x i8> %x) {
 ; CHECK-NEXT:    i8x16.shuffle 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
 ; CHECK-NEXT:    i16x8.extend_low_i8x16_u
 ; CHECK-NEXT:    i32x4.extend_low_i16x8_u
-; CHECK-NEXT:    f32x4.convert_i32x4_u
+; CHECK-NEXT:    f32x4.convert_i32x4_s
 ; CHECK-NEXT:    # fallthrough-return
   %high = shufflevector <8 x i8> %x, <8 x i8> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
   %extended = uitofp <4 x i8> %high to <4 x float>
@@ -136,7 +136,7 @@ define <2 x double> @extend_to_double_low_i16x4_u(<4 x i16> %x) {
 ; CHECK-NEXT:  # %bb.0:
 ; CHECK-NEXT:    local.get 0
 ; CHECK-NEXT:    i32x4.extend_low_i16x8_u
-; CHECK-NEXT:    f64x2.convert_low_i32x4_u
+; CHECK-NEXT:    f64x2.convert_low_i32x4_s
 ; CHECK-NEXT:    # fallthrough-return
   %low = shufflevector <4 x i16> %x, <4 x i16> undef, <2 x i32> <i32 0, i32 1>
   %extended = uitofp <2 x i16> %low to <2 x double>

zeux · 2025-07-18T23:59:00Z

~~(NVPTX tests are broken in main atm, will rebase when that gets fixed to get a clean CI run)~~ rebased

convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines. Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit.

dschuff · 2025-07-21T15:42:01Z

Thanks! Do you want me to merge this?

zeux · 2025-07-21T15:56:35Z

@dschuff Yeah that'd be great, I don't have commit permissions. Thanks!

convert_iKxN_s is canonicalized into convert_iKxN_u when the argument is known to have sign bit 0. This results in emitting Wasm opcodes that, on some targets (like x86_64), are dramatically slower than signed versions on major engines. Similarly to X86, we now fix this up in isel when the instruction has nonneg flag from canonicalization or if we know the source has zero sign bit. Fixes llvm#149457.

llvmbot added the backend:WebAssembly label Jul 18, 2025

zeux mentioned this pull request Jul 19, 2025

WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u #149457

Closed

dschuff approved these changes Jul 21, 2025

View reviewed changes

dschuff merged commit 5b98992 into llvm:main Jul 21, 2025
9 checks passed

zeux deleted the wasm-isel-uitofp branch July 21, 2025 16:55

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s#149609

[WebAssembly] Optimize convert_iKxN_u into convert_iKxN_s#149609
dschuff merged 1 commit intollvm:mainfrom
zeux:wasm-isel-uitofp

zeux commented Jul 18, 2025

Uh oh!

llvmbot commented Jul 18, 2025

Uh oh!

zeux commented Jul 18, 2025 •

edited

Loading

Uh oh!

dschuff commented Jul 21, 2025

Uh oh!

zeux commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zeux commented Jul 18, 2025

Uh oh!

llvmbot commented Jul 18, 2025

Uh oh!

zeux commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschuff commented Jul 21, 2025

Uh oh!

zeux commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeux commented Jul 18, 2025 •

edited

Loading