Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Implement GCS ACLE intrinsics #96903

Merged
merged 3 commits into from
Jul 11, 2024

Conversation

john-brawn-arm
Copy link
Collaborator

This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions, as these intrinsics make use of how some instructions don't modify the input register when GCS is disabled, and they need to be correctly marked with mayLoad/mayStore/hasSideEffects for instruction selection to work.

This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions,
as these intrinsics make use of how some instructions don't modify the
input register when GCS is disabled, and they need to be correctly
marked with mayLoad/mayStore/hasSideEffects for instruction selection
to work.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics llvm:ir labels Jun 27, 2024
@llvmbot
Copy link

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-clang

Author: John Brawn (john-brawn-arm)

Changes

This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions, as these intrinsics make use of how some instructions don't modify the input register when GCS is disabled, and they need to be correctly marked with mayLoad/mayStore/hasSideEffects for instruction selection to work.


Full diff: https://github.com/llvm/llvm-project/pull/96903.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsAArch64.def (+6)
  • (modified) clang/lib/Headers/arm_acle.h (+27)
  • (added) clang/test/CodeGen/aarch64-gcs.c (+57)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+17)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+17-5)
  • (added) llvm/test/CodeGen/AArch64/gcs-intrinsics.ll (+49)
diff --git a/clang/include/clang/Basic/BuiltinsAArch64.def b/clang/include/clang/Basic/BuiltinsAArch64.def
index 5fb199b1b2b03..8c48437e86315 100644
--- a/clang/include/clang/Basic/BuiltinsAArch64.def
+++ b/clang/include/clang/Basic/BuiltinsAArch64.def
@@ -49,6 +49,7 @@ BUILTIN(__builtin_arm_wfe, "v", "")
 BUILTIN(__builtin_arm_wfi, "v", "")
 BUILTIN(__builtin_arm_sev, "v", "")
 BUILTIN(__builtin_arm_sevl, "v", "")
+BUILTIN(__builtin_arm_chkfeat, "WUiWUi", "")
 
 // Like __builtin_trap but provide an 16-bit immediate reason code (which goes into `brk #N`).
 BUILTIN(__builtin_arm_trap, "vUIs", "nr")
@@ -136,6 +137,11 @@ TARGET_BUILTIN(__builtin_arm_st64b, "vv*WUiC*", "n", "ls64")
 TARGET_BUILTIN(__builtin_arm_st64bv, "WUiv*WUiC*", "n", "ls64")
 TARGET_BUILTIN(__builtin_arm_st64bv0, "WUiv*WUiC*", "n", "ls64")
 
+// Armv9.3-A Guarded Control Stack
+TARGET_BUILTIN(__builtin_arm_gcspopm, "WUiWUi", "n", "gcs")
+TARGET_BUILTIN(__builtin_arm_gcsss1, "vvC*", "n", "gcs")
+TARGET_BUILTIN(__builtin_arm_gcsss2, "vC*vC*", "n", "gcs")
+
 TARGET_HEADER_BUILTIN(_BitScanForward, "UcUNi*UNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
 TARGET_HEADER_BUILTIN(_BitScanReverse, "UcUNi*UNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
 TARGET_HEADER_BUILTIN(_BitScanForward64, "UcUNi*ULLi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
diff --git a/clang/lib/Headers/arm_acle.h b/clang/lib/Headers/arm_acle.h
index 5785954c9171a..4d3e8a30013cd 100644
--- a/clang/lib/Headers/arm_acle.h
+++ b/clang/lib/Headers/arm_acle.h
@@ -75,6 +75,14 @@ static __inline__ void __attribute__((__always_inline__, __nodebug__)) __yield(v
 #define __dbg(t) __builtin_arm_dbg(t)
 #endif
 
+#if defined(__ARM_64BIT_STATE) && __ARM_64BIT_STATE
+#define _CHKFEAT_GCS 1
+static __inline__ uint64_t __attribute__((__always_inline__, __nodebug__))
+__chkfeat(uint64_t __features) {
+  return __builtin_arm_chkfeat(__features) ^ __features;
+}
+#endif
+
 /* 7.5 Swap */
 static __inline__ uint32_t __attribute__((__always_inline__, __nodebug__))
 __swp(uint32_t __x, volatile uint32_t *__p) {
@@ -855,6 +863,25 @@ __rndrrs(uint64_t *__p) {
 }
 #endif
 
+/* 11.2 Guarded Control Stack intrinsics */
+#if defined(__ARM_64BIT_STATE) && __ARM_64BIT_STATE
+static __inline__ void * __attribute__((__always_inline__, __nodebug__))
+__gcspr() {
+  return (void *)__builtin_arm_rsr64("gcspr_el0");
+}
+
+static __inline__ uint64_t __attribute__((__always_inline__, __nodebug__, target("gcs")))
+__gcspopm() {
+  return __builtin_arm_gcspopm(0);
+}
+
+static __inline__ const void * __attribute__((__always_inline__, __nodebug__, target("gcs")))
+__gcsss(const void *__stack) {
+  __builtin_arm_gcsss1(__stack);
+  return __builtin_arm_gcsss2(0);
+}
+#endif
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/clang/test/CodeGen/aarch64-gcs.c b/clang/test/CodeGen/aarch64-gcs.c
new file mode 100644
index 0000000000000..e19946cf72f7f
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-gcs.c
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 2
+// RUN: %clang_cc1 -triple aarch64-eabi -target-feature +gcs -emit-llvm %s -o - | FileCheck %s
+
+#include <arm_acle.h>
+
+// CHECK-LABEL: define dso_local i64 @test_chkfeat
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[__FEATURES_ADDR_I:%.*]] = alloca i64, align 8
+// CHECK-NEXT:    store i64 1, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.aarch64.chkfeat(i64 [[TMP0]])
+// CHECK-NEXT:    [[TMP2:%.*]] = load i64, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[XOR_I:%.*]] = xor i64 [[TMP1]], [[TMP2]]
+// CHECK-NEXT:    ret i64 [[XOR_I]]
+//
+uint64_t test_chkfeat() {
+  return __chkfeat(_CHKFEAT_GCS);
+}
+
+// CHECK-LABEL: define dso_local ptr @test_gcspr
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.read_volatile_register.i64(metadata [[META2:![0-9]+]])
+// CHECK-NEXT:    [[TMP1:%.*]] = inttoptr i64 [[TMP0]] to ptr
+// CHECK-NEXT:    ret ptr [[TMP1]]
+//
+void *test_gcspr() {
+  return __gcspr();
+}
+
+// CHECK-LABEL: define dso_local i64 @test_gcspopm
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.aarch64.gcspopm(i64 0)
+// CHECK-NEXT:    ret i64 [[TMP0]]
+//
+uint64_t test_gcspopm() {
+  return __gcspopm();
+}
+
+// CHECK-LABEL: define dso_local ptr @test_gcsss
+// CHECK-SAME: (ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[__STACK_ADDR_I:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[TMP0]], ptr [[__STACK_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__STACK_ADDR_I]], align 8
+// CHECK-NEXT:    call void @llvm.aarch64.gcsss1(ptr [[TMP1]])
+// CHECK-NEXT:    [[TMP2:%.*]] = call ptr @llvm.aarch64.gcsss2(ptr null)
+// CHECK-NEXT:    ret ptr [[TMP2]]
+//
+const void *test_gcsss(const void *p) {
+  return __gcsss(p);
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 38d71b17b476d..5c5c864141c6c 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -89,6 +89,23 @@ def int_aarch64_isb : ClangBuiltin<"__builtin_arm_isb">, MSBuiltin<"__isb">,
 // ordering during ISel.
 def int_aarch64_space : DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i32_ty, llvm_i64_ty], []>;
 
+//===----------------------------------------------------------------------===//
+// Guarded Control Stack
+
+def int_aarch64_chkfeat : ClangBuiltin<"__builtin_arm_chkfeat">,
+                          DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty],
+                                                [IntrNoMem]>;
+
+def int_aarch64_gcspopm : ClangBuiltin<"__builtin_arm_gcspopm">,
+                          DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty],
+                                                [IntrReadMem, IntrHasSideEffects]>;
+
+def int_aarch64_gcsss1 : ClangBuiltin<"__builtin_arm_gcsss1">,
+                         DefaultAttrsIntrinsic<[], [llvm_ptr_ty], []>;
+
+def int_aarch64_gcsss2 : ClangBuiltin<"__builtin_arm_gcsss2">,
+                         DefaultAttrsIntrinsic<[llvm_ptr_ty], [llvm_ptr_ty], []>;
+
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index f3aac3b46d173..e0de418625e10 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1267,23 +1267,34 @@ class GCSRtIn<bits<3> op1, bits<3> op2, string mnemonic,
   let Inst{15-8} = 0b01110111;
   let Inst{7-5} = op2;
   let Predicates = [HasGCS];
+  let hasSideEffects = 1;
 }
 
-def GCSSS1   : GCSRtIn<0b011, 0b010, "gcsss1">;
+let mayStore = 1, mayLoad = 1 in
+def GCSSS1   : GCSRtIn<0b011, 0b010, "gcsss1", [(int_aarch64_gcsss1 (i64 GPR64:$Rt))]>;
+let mayStore = 1 in
 def GCSPUSHM : GCSRtIn<0b011, 0b000, "gcspushm">;
 
 class GCSRtOut<bits<3> op1, bits<3> op2, string mnemonic,
             list<dag> pattern = []>
-    : RtSystemI<1, (outs GPR64:$Rt), (ins), mnemonic, "\t$Rt", pattern> {
+    : RtSystemI<1, (outs GPR64:$Rt), (ins GPR64:$src), mnemonic, "\t$Rt", pattern> {
   let Inst{20-19} = 0b01;
   let Inst{18-16} = op1;
   let Inst{15-8} = 0b01110111;
   let Inst{7-5} = op2;
   let Predicates = [HasGCS];
+  let hasSideEffects = 1;
+  // The input register is unchanged when GCS is disabled, so we need it as
+  // both an input and output operand.
+  let Constraints = "$src = $Rt";
 }
 
-def GCSSS2  : GCSRtOut<0b011, 0b011, "gcsss2">;
-def GCSPOPM : GCSRtOut<0b011, 0b001, "gcspopm">;
+let mayStore = 1, mayLoad = 1 in
+def GCSSS2  : GCSRtOut<0b011, 0b011, "gcsss2",
+                       [(set GPR64:$Rt, (int_aarch64_gcsss2 GPR64:$src))]>;
+let mayLoad = 1 in
+def GCSPOPM : GCSRtOut<0b011, 0b001, "gcspopm",
+                       [(set GPR64:$Rt, (int_aarch64_gcspopm GPR64:$src))]>;
 def GCSPOPM_NoOp : InstAlias<"gcspopm", (GCSPOPM XZR)>, Requires<[HasGCS]>; // Rt defaults to XZR if absent
 
 def GCSB_DSYNC_disable : InstAlias<"gcsb\tdsync", (HINT 19), 0>;
@@ -1292,7 +1303,8 @@ def GCSB_DSYNC         : InstAlias<"gcsb\tdsync", (HINT 19), 1>, Requires<[HasGC
 def : TokenAlias<"DSYNC", "dsync">;
 
 let Uses = [X16], Defs = [X16], CRm = 0b0101 in {
-  def CHKFEAT   : SystemNoOperands<0b000, "hint\t#40">;
+  def CHKFEAT   : SystemNoOperands<0b000, "hint\t#40",
+                                   [(set X16, (int_aarch64_chkfeat X16))]>;
 }
 def : InstAlias<"chkfeat\tx16", (CHKFEAT), 0>;
 def : InstAlias<"chkfeat\tx16", (CHKFEAT), 1>, Requires<[HasCHK]>;
diff --git a/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll b/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll
new file mode 100644
index 0000000000000..93c95569723f5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64 -mattr=+gcs -verify-machineinstrs -o - %s | FileCheck %s
+
+define i64 @test_chkfeat(i64 %arg) {
+; CHECK-LABEL: test_chkfeat:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov x16, x0
+; CHECK-NEXT:    chkfeat x16
+; CHECK-NEXT:    mov x0, x16
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.aarch64.chkfeat(i64 %arg)
+  ret i64 %0
+}
+
+define i64 @test_gcspopm(i64 %arg) {
+; CHECK-LABEL: test_gcspopm:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcspopm x0
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.aarch64.gcspopm(i64 %arg)
+  ret i64 %0
+}
+
+define void @test_gcsss1(ptr %p) {
+; CHECK-LABEL: test_gcsss1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcsss1 x0
+; CHECK-NEXT:    ret
+entry:
+  call void @llvm.aarch64.gcsss1(ptr %p)
+  ret void
+}
+
+define ptr @test_gcsss2(ptr %p) {
+; CHECK-LABEL: test_gcsss2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcsss2 x0
+; CHECK-NEXT:    ret
+entry:
+  %0 = call ptr @llvm.aarch64.gcsss2(ptr %p)
+  ret ptr %0
+}
+
+declare i64 @llvm.aarch64.chkfeat(i64)
+declare i64 @llvm.aarch64.gcspopm(i64)
+declare void @llvm.aarch64.gcsss1(ptr)
+declare ptr @llvm.aarch64.gcsss2(ptr)

@llvmbot
Copy link

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-backend-aarch64

Author: John Brawn (john-brawn-arm)

Changes

This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions, as these intrinsics make use of how some instructions don't modify the input register when GCS is disabled, and they need to be correctly marked with mayLoad/mayStore/hasSideEffects for instruction selection to work.


Full diff: https://github.com/llvm/llvm-project/pull/96903.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsAArch64.def (+6)
  • (modified) clang/lib/Headers/arm_acle.h (+27)
  • (added) clang/test/CodeGen/aarch64-gcs.c (+57)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+17)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+17-5)
  • (added) llvm/test/CodeGen/AArch64/gcs-intrinsics.ll (+49)
diff --git a/clang/include/clang/Basic/BuiltinsAArch64.def b/clang/include/clang/Basic/BuiltinsAArch64.def
index 5fb199b1b2b03..8c48437e86315 100644
--- a/clang/include/clang/Basic/BuiltinsAArch64.def
+++ b/clang/include/clang/Basic/BuiltinsAArch64.def
@@ -49,6 +49,7 @@ BUILTIN(__builtin_arm_wfe, "v", "")
 BUILTIN(__builtin_arm_wfi, "v", "")
 BUILTIN(__builtin_arm_sev, "v", "")
 BUILTIN(__builtin_arm_sevl, "v", "")
+BUILTIN(__builtin_arm_chkfeat, "WUiWUi", "")
 
 // Like __builtin_trap but provide an 16-bit immediate reason code (which goes into `brk #N`).
 BUILTIN(__builtin_arm_trap, "vUIs", "nr")
@@ -136,6 +137,11 @@ TARGET_BUILTIN(__builtin_arm_st64b, "vv*WUiC*", "n", "ls64")
 TARGET_BUILTIN(__builtin_arm_st64bv, "WUiv*WUiC*", "n", "ls64")
 TARGET_BUILTIN(__builtin_arm_st64bv0, "WUiv*WUiC*", "n", "ls64")
 
+// Armv9.3-A Guarded Control Stack
+TARGET_BUILTIN(__builtin_arm_gcspopm, "WUiWUi", "n", "gcs")
+TARGET_BUILTIN(__builtin_arm_gcsss1, "vvC*", "n", "gcs")
+TARGET_BUILTIN(__builtin_arm_gcsss2, "vC*vC*", "n", "gcs")
+
 TARGET_HEADER_BUILTIN(_BitScanForward, "UcUNi*UNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
 TARGET_HEADER_BUILTIN(_BitScanReverse, "UcUNi*UNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
 TARGET_HEADER_BUILTIN(_BitScanForward64, "UcUNi*ULLi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
diff --git a/clang/lib/Headers/arm_acle.h b/clang/lib/Headers/arm_acle.h
index 5785954c9171a..4d3e8a30013cd 100644
--- a/clang/lib/Headers/arm_acle.h
+++ b/clang/lib/Headers/arm_acle.h
@@ -75,6 +75,14 @@ static __inline__ void __attribute__((__always_inline__, __nodebug__)) __yield(v
 #define __dbg(t) __builtin_arm_dbg(t)
 #endif
 
+#if defined(__ARM_64BIT_STATE) && __ARM_64BIT_STATE
+#define _CHKFEAT_GCS 1
+static __inline__ uint64_t __attribute__((__always_inline__, __nodebug__))
+__chkfeat(uint64_t __features) {
+  return __builtin_arm_chkfeat(__features) ^ __features;
+}
+#endif
+
 /* 7.5 Swap */
 static __inline__ uint32_t __attribute__((__always_inline__, __nodebug__))
 __swp(uint32_t __x, volatile uint32_t *__p) {
@@ -855,6 +863,25 @@ __rndrrs(uint64_t *__p) {
 }
 #endif
 
+/* 11.2 Guarded Control Stack intrinsics */
+#if defined(__ARM_64BIT_STATE) && __ARM_64BIT_STATE
+static __inline__ void * __attribute__((__always_inline__, __nodebug__))
+__gcspr() {
+  return (void *)__builtin_arm_rsr64("gcspr_el0");
+}
+
+static __inline__ uint64_t __attribute__((__always_inline__, __nodebug__, target("gcs")))
+__gcspopm() {
+  return __builtin_arm_gcspopm(0);
+}
+
+static __inline__ const void * __attribute__((__always_inline__, __nodebug__, target("gcs")))
+__gcsss(const void *__stack) {
+  __builtin_arm_gcsss1(__stack);
+  return __builtin_arm_gcsss2(0);
+}
+#endif
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/clang/test/CodeGen/aarch64-gcs.c b/clang/test/CodeGen/aarch64-gcs.c
new file mode 100644
index 0000000000000..e19946cf72f7f
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-gcs.c
@@ -0,0 +1,57 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 2
+// RUN: %clang_cc1 -triple aarch64-eabi -target-feature +gcs -emit-llvm %s -o - | FileCheck %s
+
+#include <arm_acle.h>
+
+// CHECK-LABEL: define dso_local i64 @test_chkfeat
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[__FEATURES_ADDR_I:%.*]] = alloca i64, align 8
+// CHECK-NEXT:    store i64 1, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.aarch64.chkfeat(i64 [[TMP0]])
+// CHECK-NEXT:    [[TMP2:%.*]] = load i64, ptr [[__FEATURES_ADDR_I]], align 8
+// CHECK-NEXT:    [[XOR_I:%.*]] = xor i64 [[TMP1]], [[TMP2]]
+// CHECK-NEXT:    ret i64 [[XOR_I]]
+//
+uint64_t test_chkfeat() {
+  return __chkfeat(_CHKFEAT_GCS);
+}
+
+// CHECK-LABEL: define dso_local ptr @test_gcspr
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.read_volatile_register.i64(metadata [[META2:![0-9]+]])
+// CHECK-NEXT:    [[TMP1:%.*]] = inttoptr i64 [[TMP0]] to ptr
+// CHECK-NEXT:    ret ptr [[TMP1]]
+//
+void *test_gcspr() {
+  return __gcspr();
+}
+
+// CHECK-LABEL: define dso_local i64 @test_gcspopm
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.aarch64.gcspopm(i64 0)
+// CHECK-NEXT:    ret i64 [[TMP0]]
+//
+uint64_t test_gcspopm() {
+  return __gcspopm();
+}
+
+// CHECK-LABEL: define dso_local ptr @test_gcsss
+// CHECK-SAME: (ptr noundef [[P:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[__STACK_ADDR_I:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT:    store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT:    store ptr [[TMP0]], ptr [[__STACK_ADDR_I]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__STACK_ADDR_I]], align 8
+// CHECK-NEXT:    call void @llvm.aarch64.gcsss1(ptr [[TMP1]])
+// CHECK-NEXT:    [[TMP2:%.*]] = call ptr @llvm.aarch64.gcsss2(ptr null)
+// CHECK-NEXT:    ret ptr [[TMP2]]
+//
+const void *test_gcsss(const void *p) {
+  return __gcsss(p);
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 38d71b17b476d..5c5c864141c6c 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -89,6 +89,23 @@ def int_aarch64_isb : ClangBuiltin<"__builtin_arm_isb">, MSBuiltin<"__isb">,
 // ordering during ISel.
 def int_aarch64_space : DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i32_ty, llvm_i64_ty], []>;
 
+//===----------------------------------------------------------------------===//
+// Guarded Control Stack
+
+def int_aarch64_chkfeat : ClangBuiltin<"__builtin_arm_chkfeat">,
+                          DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty],
+                                                [IntrNoMem]>;
+
+def int_aarch64_gcspopm : ClangBuiltin<"__builtin_arm_gcspopm">,
+                          DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty],
+                                                [IntrReadMem, IntrHasSideEffects]>;
+
+def int_aarch64_gcsss1 : ClangBuiltin<"__builtin_arm_gcsss1">,
+                         DefaultAttrsIntrinsic<[], [llvm_ptr_ty], []>;
+
+def int_aarch64_gcsss2 : ClangBuiltin<"__builtin_arm_gcsss2">,
+                         DefaultAttrsIntrinsic<[llvm_ptr_ty], [llvm_ptr_ty], []>;
+
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index f3aac3b46d173..e0de418625e10 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -1267,23 +1267,34 @@ class GCSRtIn<bits<3> op1, bits<3> op2, string mnemonic,
   let Inst{15-8} = 0b01110111;
   let Inst{7-5} = op2;
   let Predicates = [HasGCS];
+  let hasSideEffects = 1;
 }
 
-def GCSSS1   : GCSRtIn<0b011, 0b010, "gcsss1">;
+let mayStore = 1, mayLoad = 1 in
+def GCSSS1   : GCSRtIn<0b011, 0b010, "gcsss1", [(int_aarch64_gcsss1 (i64 GPR64:$Rt))]>;
+let mayStore = 1 in
 def GCSPUSHM : GCSRtIn<0b011, 0b000, "gcspushm">;
 
 class GCSRtOut<bits<3> op1, bits<3> op2, string mnemonic,
             list<dag> pattern = []>
-    : RtSystemI<1, (outs GPR64:$Rt), (ins), mnemonic, "\t$Rt", pattern> {
+    : RtSystemI<1, (outs GPR64:$Rt), (ins GPR64:$src), mnemonic, "\t$Rt", pattern> {
   let Inst{20-19} = 0b01;
   let Inst{18-16} = op1;
   let Inst{15-8} = 0b01110111;
   let Inst{7-5} = op2;
   let Predicates = [HasGCS];
+  let hasSideEffects = 1;
+  // The input register is unchanged when GCS is disabled, so we need it as
+  // both an input and output operand.
+  let Constraints = "$src = $Rt";
 }
 
-def GCSSS2  : GCSRtOut<0b011, 0b011, "gcsss2">;
-def GCSPOPM : GCSRtOut<0b011, 0b001, "gcspopm">;
+let mayStore = 1, mayLoad = 1 in
+def GCSSS2  : GCSRtOut<0b011, 0b011, "gcsss2",
+                       [(set GPR64:$Rt, (int_aarch64_gcsss2 GPR64:$src))]>;
+let mayLoad = 1 in
+def GCSPOPM : GCSRtOut<0b011, 0b001, "gcspopm",
+                       [(set GPR64:$Rt, (int_aarch64_gcspopm GPR64:$src))]>;
 def GCSPOPM_NoOp : InstAlias<"gcspopm", (GCSPOPM XZR)>, Requires<[HasGCS]>; // Rt defaults to XZR if absent
 
 def GCSB_DSYNC_disable : InstAlias<"gcsb\tdsync", (HINT 19), 0>;
@@ -1292,7 +1303,8 @@ def GCSB_DSYNC         : InstAlias<"gcsb\tdsync", (HINT 19), 1>, Requires<[HasGC
 def : TokenAlias<"DSYNC", "dsync">;
 
 let Uses = [X16], Defs = [X16], CRm = 0b0101 in {
-  def CHKFEAT   : SystemNoOperands<0b000, "hint\t#40">;
+  def CHKFEAT   : SystemNoOperands<0b000, "hint\t#40",
+                                   [(set X16, (int_aarch64_chkfeat X16))]>;
 }
 def : InstAlias<"chkfeat\tx16", (CHKFEAT), 0>;
 def : InstAlias<"chkfeat\tx16", (CHKFEAT), 1>, Requires<[HasCHK]>;
diff --git a/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll b/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll
new file mode 100644
index 0000000000000..93c95569723f5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/gcs-intrinsics.ll
@@ -0,0 +1,49 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64 -mattr=+gcs -verify-machineinstrs -o - %s | FileCheck %s
+
+define i64 @test_chkfeat(i64 %arg) {
+; CHECK-LABEL: test_chkfeat:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov x16, x0
+; CHECK-NEXT:    chkfeat x16
+; CHECK-NEXT:    mov x0, x16
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.aarch64.chkfeat(i64 %arg)
+  ret i64 %0
+}
+
+define i64 @test_gcspopm(i64 %arg) {
+; CHECK-LABEL: test_gcspopm:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcspopm x0
+; CHECK-NEXT:    ret
+entry:
+  %0 = call i64 @llvm.aarch64.gcspopm(i64 %arg)
+  ret i64 %0
+}
+
+define void @test_gcsss1(ptr %p) {
+; CHECK-LABEL: test_gcsss1:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcsss1 x0
+; CHECK-NEXT:    ret
+entry:
+  call void @llvm.aarch64.gcsss1(ptr %p)
+  ret void
+}
+
+define ptr @test_gcsss2(ptr %p) {
+; CHECK-LABEL: test_gcsss2:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    gcsss2 x0
+; CHECK-NEXT:    ret
+entry:
+  %0 = call ptr @llvm.aarch64.gcsss2(ptr %p)
+  ret ptr %0
+}
+
+declare i64 @llvm.aarch64.chkfeat(i64)
+declare i64 @llvm.aarch64.gcspopm(i64)
+declare void @llvm.aarch64.gcsss1(ptr)
+declare ptr @llvm.aarch64.gcsss2(ptr)

Comment on lines 879 to 882
__gcsss(const void *__stack) {
__builtin_arm_gcsss1(__stack);
return __builtin_arm_gcsss2(0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever expect these builtins to be called separately? If not, what's the rationale for having two of them, rather than one that does both operations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having builtins that directly match the instructions is more convenient in terms of implementation. Tablegen gives the error "Cannot handle instructions producing instructions with temporaries yet!" if you try to have a pattern with two output instructions when the output of one isn't the input of another, so this would require custom lowering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns are:

  • builtins eventually become a defacto compiler interface, with users using them directly outside of compiler-only headers (and potentially expecting them to be compatible between gcc and clang). Maybe this is fine, as we can reserve the right to break them (but doing so is still not easy)? The other obvious option is to match the builtin to the ACLE intrinsic, and write C++, as you say, which is not as easy but also not necessarily any better as an implementation.
  • What happens if the compiler ends up separating these two operations? I guess maybe we can leave this until it is a reported problem. Even if we had one operation until isel, we might still have to make sure that reasonable optimisations were prevented post-isel. I realise that the sideEffects = 1 on the instructions should prevent a lot of these, but I'm not clear on things that should be side-effect safe but might interact with GCS such as Machine Outliner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out to be not that difficult to have a single builtin, so I've done that.

Copy link
Contributor

@hstk30-hw hstk30-hw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, don't known why the CI failed :>

Copy link

github-actions bot commented Jul 9, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff dfd2711f8f70fca45d2ddbca0eede7ad957ec307 eed6d78980fb78f79a259b3ac2239542da9b9a93 --extensions cpp,h,c -- clang/test/CodeGen/aarch64-gcs.c clang/lib/Headers/arm_acle.h llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Headers/arm_acle.h b/clang/lib/Headers/arm_acle.h
index 1518b0c4c8..9ff433627c 100644
--- a/clang/lib/Headers/arm_acle.h
+++ b/clang/lib/Headers/arm_acle.h
@@ -865,17 +865,19 @@ __rndrrs(uint64_t *__p) {
 
 /* 11.2 Guarded Control Stack intrinsics */
 #if defined(__ARM_64BIT_STATE) && __ARM_64BIT_STATE
-static __inline__ void * __attribute__((__always_inline__, __nodebug__))
+static __inline__ void *__attribute__((__always_inline__, __nodebug__))
 __gcspr() {
   return (void *)__builtin_arm_rsr64("gcspr_el0");
 }
 
-static __inline__ uint64_t __attribute__((__always_inline__, __nodebug__, target("gcs")))
-__gcspopm() {
+static __inline__ uint64_t
+    __attribute__((__always_inline__, __nodebug__, target("gcs")))
+    __gcspopm() {
   return __builtin_arm_gcspopm(0);
 }
 
-static __inline__ const void * __attribute__((__always_inline__, __nodebug__, target("gcs")))
+static __inline__ const void *__attribute__((__always_inline__, __nodebug__,
+                                             target("gcs")))
 __gcsss(const void *__stack) {
   return __builtin_arm_gcsss(__stack);
 }

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

As a follow-up, can you check that the instructions generated from this builtin do inhibit the machine outliner? Maybe all GCS-modifying functions have to inhibit the machine outliner, I'm not 100% sure.

@john-brawn-arm
Copy link
Collaborator Author

I've added a new change to work around a problem in intrinsic handling that's detailed in https://discourse.llvm.org/t/intrinsic-with-sideeffect-is-optimized-out/66053 (I previously had this workaround in, but removed it because it looked like it was no longer needed, but actually I just hadn't tested it enough).

@john-brawn-arm
Copy link
Collaborator Author

Thanks, LGTM.

As a follow-up, can you check that the instructions generated from this builtin do inhibit the machine outliner? Maybe all GCS-modifying functions have to inhibit the machine outliner, I'm not 100% sure.

The GCS instructions don't inhibit the machine outliner, but I don't think the machine outliner is a problem. From the perspective of what instructions get executed the only difference when things get outlined is that there's an extra (non-bl) branch instruction, and that won't do anything to the gcs stack.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@john-brawn-arm john-brawn-arm merged commit 3a14ffb into llvm:main Jul 11, 2024
3 of 6 checks passed
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions, as
these intrinsics make use of how some instructions don't modify the
input register when GCS is disabled, and they need to be correctly
marked with mayLoad/mayStore/hasSideEffects for instruction selection to
work.
@john-brawn-arm john-brawn-arm deleted the gcs_acle branch July 16, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants