Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Open
wants to merge 20 commits into
base: sycl
Choose a base branch
from

Conversation

zahiraam
Copy link
Contributor

@zahiraam zahiraam commented Oct 23, 2024

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

@zahiraam zahiraam changed the title Add support for -ftarget-prec-div/sqrt options. Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024
@zahiraam zahiraam marked this pull request as ready for review October 28, 2024 17:25
@zahiraam zahiraam requested review from a team as code owners October 28, 2024 17:25
@zahiraam zahiraam changed the title Add support for -foffload-fp32-prec-div/sqrt options. [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024
Comment on lines 1736 to 1739
if (!strcmp(A->getValue(), "fast")) {
CmdArgs.push_back("-fno-offload-fp32-prec-div");
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow users to override with -foffload-fp32-prec-div|sqrt?

Suggested change
if (!strcmp(A->getValue(), "fast")) {
CmdArgs.push_back("-fno-offload-fp32-prec-div");
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
}
if (!strcmp(A->getValue(), "fast")) {
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,
option::OPT_fno_offload_fp32_prec_div, false))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_sqrt,
option::OPT_fno_offload_fp32_prec_sqrt, false))
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I would think that users could choose to compile with:
clang -fsycl -ffp-model=fast -foffload-fp32-prec-sqrt hello.cpp
or:
clang -fsycl -foffload-fp32-prec-sqrt -ffp-model=fast hello.cpp
These shouldn't give the same result. In the first one, the sqrt results are precise. In the second one, they are rounded.

I think that's just following the last command wins rule. In which case we need a compilated process here to find the order in which the options interact with one another.

Copy link
Contributor

@mdtoguchi mdtoguchi Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... If that's the case we may want to integrate the logic into where all of the other FP model options are being manipulated in the larger for loop here:

static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
and only add the -cc1 option under the IsDeviceOffloading condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay and that would work for OpenMP too!

Comment on lines 1747 to 1750
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,
option::OPT_fno_offload_fp32_prec_div, true))
CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
else
CmdArgs.push_back("-foffload-fp32-prec-sqrt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment to above.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv-unknown-unknown

// DEFINE: %{common_opts_spir64} = -internal-isystem %S/Inputs \
// DEFINE: -fsycl-is-device -emit-llvm -triple spirv64-unknown-unknown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common_opts_spir64 seems identical to common_opts_spirv64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPV_INTEL_fp_max_error related changes LGTM

}
};

auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto ParseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {
auto parseFPAccOption = [&](StringRef Val, bool &NoOffloadFlag) {

Function naming should start with lowercase.

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to withdraw my review as have 2 questions.

(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
IsFloat32Type);
bool isFP32FdivFunction =
(FuncName == "fdiv" && !getLangOpts().OffloadFP32PrecDiv &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually though, that the request is done to replace fdiv instruction with the intrinsic, not fdiv function. Do we know if users actually use such function? I don't see any mentioning of it in SYCL or OpenCL specifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck could you please comment on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of -foffload-fp32-prev-div is to affect the native divide operation (i.e. /). There is no SYCL function named fdiv. Is there a standard C / C++ function with that name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there is no standard function float FP division. There is std::div, but it works only on integers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no C/C++ fdiv function.

bool hasFPAccuracyFuncMap = hasAccuracyRequirement(FuncName);
bool hasFPAccuracyVal = !getLangOpts().FPAccuracyVal.empty();
bool isFp32SqrtFunction =
(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we compare with un-mangled sqrt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuncName is the output of FD->getName() which returns a simple identifier. https://github.com/intel/llvm/blob/sycl/clang/include/clang/AST/Decl.h#L280

Copy link
Contributor

@MrSidims MrSidims Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp will pass even with extern "C" removed from sqrt function declaration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user has a function in their own namespace that happens to be named "sqrt"?

Comment on lines 3023 to 3025
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
!JA.isDeviceOffloading(Action::OFK_Cuda) &&
!JA.isOffloading(Action::OFK_HIP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offlne, something like:

Suggested change
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
!JA.isDeviceOffloading(Action::OFK_Cuda) &&
!JA.isOffloading(Action::OFK_HIP);
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
TC.getTriple().isSPIROrSPIRV();

instruction gets the precision set instead of the fdiv function.
@zahiraam
Copy link
Contributor Author

@MrSidims and @gmlueck I have removed the restriction for CUDA/HIP. My understanding is that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function. Is that the case?
I have also changed the code so that / has precision set with the options instead of fdiv.
Please let me know if these are the changes you expected.

@MrSidims
Copy link
Contributor

that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function

In email thread I've replied, that I'm planning to take care of the precise option propagating to CUDA and HIP drivers. I can take a look what should be done for the implementation of non-precise intrinsics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants