[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

zahiraam · 2024-10-23T15:39:51Z

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

to be applied to OpenMP too.

clang/lib/Driver/ToolChains/Clang.cpp

mdtoguchi · 2024-10-29T18:03:31Z

clang/lib/Driver/ToolChains/Clang.cpp

+      if (!strcmp(A->getValue(), "fast")) {
+        CmdArgs.push_back("-fno-offload-fp32-prec-div");
+        CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
+      }


Should we allow users to override with -foffload-fp32-prec-div|sqrt?

Suggested change

if (!strcmp(A->getValue(), "fast")) {

CmdArgs.push_back("-fno-offload-fp32-prec-div");

CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");

}

if (!strcmp(A->getValue(), "fast")) {

if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,

option::OPT_fno_offload_fp32_prec_div, false))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

if (!Args.hasFlag(option::OPT_foffload_fp32_prec_sqrt,

option::OPT_fno_offload_fp32_prec_sqrt, false))

CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");

}

Not sure. I would think that users could choose to compile with:
clang -fsycl -ffp-model=fast -foffload-fp32-prec-sqrt hello.cpp
or:
clang -fsycl -foffload-fp32-prec-sqrt -ffp-model=fast hello.cpp
These shouldn't give the same result. In the first one, the sqrt results are precise. In the second one, they are rounded.

I think that's just following the last command wins rule. In which case we need a compilated process here to find the order in which the options interact with one another.

Hmm... If that's the case we may want to integrate the logic into where all of the other FP model options are being manipulated in the larger for loop here:

llvm/clang/lib/Driver/ToolChains/Clang.cpp

Line 2994 in bdf78d7

static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,

and only add the -cc1 option under the IsDeviceOffloading condition.

Okay and that would work for OpenMP too!

mdtoguchi · 2024-10-29T18:08:06Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
+      CmdArgs.push_back("-fno-offload-fp32-prec-div");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-div");


Suggested change

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

else

CmdArgs.push_back("-foffload-fp32-prec-div");

if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,

option::OPT_fno_offload_fp32_prec_div, true))

CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

mdtoguchi · 2024-10-29T18:08:26Z

clang/lib/Driver/ToolChains/Clang.cpp

+    if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
+      CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
+    else
+      CmdArgs.push_back("-foffload-fp32-prec-sqrt");


similar comment to above.

elizabethandrews

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.

premanandrao · 2024-11-07T20:42:04Z

clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp

+// DEFINE: -fsycl-is-device -emit-llvm -triple spirv-unknown-unknown
+
+// DEFINE: %{common_opts_spir64} = -internal-isystem %S/Inputs \
+// DEFINE: -fsycl-is-device -emit-llvm -triple spirv64-unknown-unknown


common_opts_spir64 seems identical to common_opts_spirv64.

MrSidims

SPV_INTEL_fp_max_error related changes LGTM

mdtoguchi · 2024-11-12T15:29:18Z

clang/lib/Driver/ToolChains/Clang.cpp

+    }
+  };
+
+  auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {


Suggested change

auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {

auto parseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {

Function naming should start with lowercase.

MrSidims

Has to withdraw my review as have 2 questions.

MrSidims · 2024-11-15T16:43:58Z

clang/lib/CodeGen/CGCall.cpp

+          (FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
+           IsFloat32Type);
+      bool isFP32FdivFunction =
+          (FuncName == "fdiv" && !getLangOpts().OffloadFP32PrecDiv &&


I actually though, that the request is done to replace fdiv instruction with the intrinsic, not fdiv function. Do we know if users actually use such function? I don't see any mentioning of it in SYCL or OpenCL specifications.

@gmlueck could you please comment on that?

The intent of -foffload-fp32-prev-div is to affect the native divide operation (i.e. /). There is no SYCL function named fdiv. Is there a standard C / C++ function with that name?

AFAIK there is no standard function float FP division. There is std::div, but it works only on integers.

There is no C/C++ fdiv function.

MrSidims · 2024-11-15T16:44:52Z

clang/lib/CodeGen/CGCall.cpp

+      bool hasFPAccuracyFuncMap = hasAccuracyRequirement(FuncName);
+      bool hasFPAccuracyVal = !getLangOpts().FPAccuracyVal.empty();
+      bool isFp32SqrtFunction =
+          (FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&


Why do we compare with un-mangled sqrt?

FuncName is the output of FD->getName() which returns a simple identifier. https://github.com/intel/llvm/blob/sycl/clang/include/clang/AST/Decl.h#L280

So clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp will pass even with extern "C" removed from sqrt function declaration?

What if the user has a function in their own namespace that happens to be named "sqrt"?

mdtoguchi · 2024-11-15T21:51:51Z

clang/lib/Driver/ToolChains/Clang.cpp

+  bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
+                                  !JA.isDeviceOffloading(Action::OFK_Cuda) &&
+                                  !JA.isOffloading(Action::OFK_HIP);


As discussed offlne, something like:

Suggested change

bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&

!JA.isDeviceOffloading(Action::OFK_Cuda) &&

!JA.isOffloading(Action::OFK_HIP);

bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&

TC.getTriple().isSPIROrSPIRV();

instruction gets the precision set instead of the fdiv function.

zahiraam · 2024-11-18T20:53:32Z

@MrSidims and @gmlueck I have removed the restriction for CUDA/HIP. My understanding is that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function. Is that the case?
I have also changed the code so that / has precision set with the options instead of fdiv.
Please let me know if these are the changes you expected.

MrSidims · 2024-11-19T19:27:06Z

that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function

In email thread I've replied, that I'm planning to take care of the precise option propagating to CUDA and HIP drivers. I can take a look what should be done for the implementation of non-precise intrinsics.

zahiraam added 2 commits October 23, 2024 08:38

Add support for -ftarget-prec-div/sqrt options.

f8caf83

Added fast-math run lines to LIT tests.

00ffb5a

zahiraam requested a review from mdtoguchi October 23, 2024 19:11

zahiraam temporarily deployed to WindowsCILock October 23, 2024 19:12 — with GitHub Actions Inactive

zahiraam requested review from jcranmer-intel and gmlueck October 23, 2024 19:12

zahiraam temporarily deployed to WindowsCILock October 23, 2024 20:34 — with GitHub Actions Inactive

Renamed the options accordingly.

795dd38

zahiraam changed the title ~~Add support for -ftarget-prec-div/sqrt options.~~ Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024

zahiraam had a problem deploying to WindowsCILock October 24, 2024 15:09 — with GitHub Actions Error

Fix format.

78a9005

zahiraam temporarily deployed to WindowsCILock October 24, 2024 15:21 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 24, 2024 17:29 — with GitHub Actions Inactive

Changed the place where the options are added in order for the options

50e71c0

to be applied to OpenMP too.

zahiraam marked this pull request as ready for review October 28, 2024 17:25

zahiraam requested review from a team as code owners October 28, 2024 17:25

zahiraam temporarily deployed to WindowsCILock October 28, 2024 17:26 — with GitHub Actions Inactive

zahiraam had a problem deploying to WindowsCILock October 28, 2024 19:51 — with GitHub Actions Error

Fix format.

54f2409

zahiraam temporarily deployed to WindowsCILock October 28, 2024 21:34 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock October 29, 2024 00:53 — with GitHub Actions Inactive

zahiraam changed the title ~~Add support for -foffload-fp32-prec-div/sqrt options.~~ [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved

mdtoguchi reviewed Oct 29, 2024

View reviewed changes

Addresed review comments.

bdf78d7

zahiraam temporarily deployed to WindowsCILock October 29, 2024 20:24 — with GitHub Actions Inactive

elizabethandrews reviewed Oct 29, 2024

View reviewed changes

zahiraam temporarily deployed to WindowsCILock October 29, 2024 21:42 — with GitHub Actions Inactive

Put the code to handle the options in RenderFloatingPointOptions

8cd6d8b

function instead of adding a JobAction to handle it.

zahiraam temporarily deployed to WindowsCILock November 5, 2024 19:24 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 5, 2024 20:38 — with GitHub Actions Inactive

premanandrao reviewed Nov 7, 2024

View reviewed changes

Addressed review comments.

24711fd

zahiraam temporarily deployed to WindowsCILock November 8, 2024 15:30 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 8, 2024 16:38 — with GitHub Actions Inactive

MrSidims mentioned this pull request Nov 12, 2024

[SYCL] Add -foffload-fp32-prec-[div/sqrt] FE option handling in JIT oneapi-src/unified-runtime#2315

Open

MrSidims approved these changes Nov 12, 2024

View reviewed changes

mdtoguchi reviewed Nov 12, 2024

View reviewed changes

Renamed function.

56314b7

zahiraam temporarily deployed to WindowsCILock November 12, 2024 15:37 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 12, 2024 16:53 — with GitHub Actions Inactive

Addressed review comments.

e643027

zahiraam had a problem deploying to WindowsCILock November 13, 2024 20:39 — with GitHub Actions Error

Changed SplitFPAccuracyVal to be a static function instead of a lambda.

b25e5ac

zahiraam temporarily deployed to WindowsCILock November 13, 2024 21:53 — with GitHub Actions Inactive

mdtoguchi approved these changes Nov 13, 2024

View reviewed changes

zahiraam temporarily deployed to WindowsCILock November 13, 2024 23:56 — with GitHub Actions Inactive

premanandrao approved these changes Nov 14, 2024

View reviewed changes

MrSidims requested changes Nov 15, 2024

View reviewed changes

Restricting the use of the options to sycl only.

ce00296

zahiraam temporarily deployed to WindowsCILock November 15, 2024 17:55 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 15, 2024 19:14 — with GitHub Actions Inactive

mdtoguchi reviewed Nov 15, 2024

View reviewed changes

MrSidims mentioned this pull request Nov 18, 2024

[SYCL] Pass foffload-fp32-prec-[div/sqrt] options to device's BE #16107

Draft

Remove restriction on Cuda/Hip and changed the code so that the div

bc01759

instruction gets the precision set instead of the fdiv function.

zahiraam temporarily deployed to WindowsCILock November 18, 2024 20:48 — with GitHub Actions Inactive

zahiraam temporarily deployed to WindowsCILock November 18, 2024 22:03 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

zahiraam commented Oct 23, 2024 •

edited

Loading

mdtoguchi Oct 29, 2024

zahiraam Oct 29, 2024

mdtoguchi Oct 29, 2024 •

edited

Loading

zahiraam Oct 30, 2024

mdtoguchi Oct 29, 2024

mdtoguchi Oct 29, 2024

elizabethandrews left a comment

premanandrao Nov 7, 2024

zahiraam Nov 8, 2024

MrSidims left a comment

mdtoguchi Nov 12, 2024

MrSidims left a comment

MrSidims Nov 15, 2024

MrSidims Nov 15, 2024

gmlueck Nov 15, 2024

MrSidims Nov 15, 2024

zahiraam Nov 15, 2024

MrSidims Nov 15, 2024

zahiraam Nov 15, 2024

MrSidims Nov 15, 2024 •

edited

Loading

gmlueck Nov 15, 2024

mdtoguchi Nov 15, 2024

zahiraam commented Nov 18, 2024

MrSidims commented Nov 19, 2024

	auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {
	auto parseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Are you sure you want to change the base?

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Conversation

zahiraam commented Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdtoguchi Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elizabethandrews left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrSidims left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrSidims left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrSidims Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zahiraam commented Nov 18, 2024

MrSidims commented Nov 19, 2024

zahiraam commented Oct 23, 2024 •

edited

Loading

mdtoguchi Oct 29, 2024 •

edited

Loading

MrSidims Nov 15, 2024 •

edited

Loading