-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836
base: sycl
Are you sure you want to change the base?
Conversation
to be applied to OpenMP too.
if (!strcmp(A->getValue(), "fast")) { | ||
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | ||
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow users to override with -foffload-fp32-prec-div|sqrt
?
if (!strcmp(A->getValue(), "fast")) { | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | |
} | |
if (!strcmp(A->getValue(), "fast")) { | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div, | |
option::OPT_fno_offload_fp32_prec_div, false)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_sqrt, | |
option::OPT_fno_offload_fp32_prec_sqrt, false)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I would think that users could choose to compile with:
clang -fsycl -ffp-model=fast -foffload-fp32-prec-sqrt hello.cpp
or:
clang -fsycl -foffload-fp32-prec-sqrt -ffp-model=fast hello.cpp
These shouldn't give the same result. In the first one, the sqrt results are precise. In the second one, they are rounded.
I think that's just following the last command wins rule. In which case we need a compilated process here to find the order in which the options interact with one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... If that's the case we may want to integrate the logic into where all of the other FP model options are being manipulated in the larger for loop here:
llvm/clang/lib/Driver/ToolChains/Clang.cpp
Line 2994 in bdf78d7
static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D, |
-cc1
option under the IsDeviceOffloading
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay and that would work for OpenMP too!
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-div"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); | |
else | |
CmdArgs.push_back("-foffload-fp32-prec-div"); | |
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div, | |
option::OPT_fno_offload_fp32_prec_div, true)) | |
CmdArgs.push_back("-fno-offload-fp32-prec-div"); |
Since -foffload-fp32-prec-div
is default
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt)) | ||
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt"); | ||
else | ||
CmdArgs.push_back("-foffload-fp32-prec-sqrt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@premanandrao can you review this please?
function instead of adding a JobAction to handle it.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv-unknown-unknown | ||
|
||
// DEFINE: %{common_opts_spir64} = -internal-isystem %S/Inputs \ | ||
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv64-unknown-unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common_opts_spir64
seems identical to common_opts_spirv64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPV_INTEL_fp_max_error related changes LGTM
} | ||
}; | ||
|
||
auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { | |
auto parseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) { |
Function naming should start with lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has to withdraw my review as have 2 questions.
clang/lib/CodeGen/CGCall.cpp
Outdated
(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt && | ||
IsFloat32Type); | ||
bool isFP32FdivFunction = | ||
(FuncName == "fdiv" && !getLangOpts().OffloadFP32PrecDiv && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually though, that the request is done to replace fdiv instruction with the intrinsic, not fdiv function. Do we know if users actually use such function? I don't see any mentioning of it in SYCL or OpenCL specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmlueck could you please comment on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of -foffload-fp32-prev-div
is to affect the native divide operation (i.e. /
). There is no SYCL function named fdiv
. Is there a standard C / C++ function with that name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there is no standard function float FP division. There is std::div, but it works only on integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no C/C++ fdiv function.
bool hasFPAccuracyFuncMap = hasAccuracyRequirement(FuncName); | ||
bool hasFPAccuracyVal = !getLangOpts().FPAccuracyVal.empty(); | ||
bool isFp32SqrtFunction = | ||
(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we compare with un-mangled sqrt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FuncName is the output of FD->getName() which returns a simple identifier. https://github.com/intel/llvm/blob/sycl/clang/include/clang/AST/Decl.h#L280
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp will pass even with extern "C"
removed from sqrt
function declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user has a function in their own namespace that happens to be named "sqrt"?
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) && | ||
!JA.isDeviceOffloading(Action::OFK_Cuda) && | ||
!JA.isOffloading(Action::OFK_HIP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offlne, something like:
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) && | |
!JA.isDeviceOffloading(Action::OFK_Cuda) && | |
!JA.isOffloading(Action::OFK_HIP); | |
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) && | |
TC.getTriple().isSPIROrSPIRV(); |
instruction gets the precision set instead of the fdiv function.
@MrSidims and @gmlueck I have removed the restriction for |
In email thread I've replied, that I'm planning to take care of the precise option propagating to CUDA and HIP drivers. I can take a look what should be done for the implementation of non-precise intrinsics. |
Add support for options
-f[no]-offload-fp32-prec-div
and-f[no-]-offload-fp32-prec-sqrt
.These options are added to allow users to control whether
fdiv
andsqrt
operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.When the correctly rounded setting is used, we can just generate the
fdiv
instruction andllvm.sqrt
intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.When the result is not required to be correctly rounded, the front end should generate a call to the
llvm.fpbuiltin.fdiv
orllvm.fpbuiltin.sqrt
intrinsic with thefpbuiltin-max-error
attribute set. For single precisionfdiv
, the setting should be2.5
. For single-precision sqrt, the setting should be3.0
.If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set
-foffload-fp32-prec-div
or-foffload-fp32-prec-sqrt
option.