-
Notifications
You must be signed in to change notification settings - Fork 206
Do not use -Ofast
#10267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: IB/CMSSW_16_1_X/master
Are you sure you want to change the base?
Do not use -Ofast
#10267
Conversation
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.
|
A new Pull Request was created by @fwyzard for branch IB/CMSSW_16_1_X/master. @akritkbehera, @cmsbuild, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks. |
|
cms-bot internal usage |
|
|
enable gpu |
|
please test |
|
-1 Failed Tests: UnitTests nvidia_l40sUnitTests Failed Unit TestsI found 2 errors in the following unit tests: ---> test Miscellanea had ERRORS ---> test createDBObjects had ERRORS Comparison SummaryThe workflows 2025.0010001, 2024.0050001, 2023.0020001 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons Summary:
AMD_MI300X Comparison SummarySummary:
AMD_W7900 Comparison SummarySummary:
NVIDIA_H100 Comparison SummarySummary:
NVIDIA_L40S Comparison SummarySummary:
NVIDIA_T4 Comparison SummarySummary:
|
|
While I agree with the premise of the PR description, I wonder if these options
impact the GCC auto-vectorization? (that, IIRC, was one major motivation for using |
|
IMHO We could look into enabling it at compile time and not at link time, but I'm a bit wary of this mixed approach :-( |
|
|
|
|
Does that even work with LTO? (OTOH, I thought mucking with the global FP state in the shared library init was removed in gcc13.) |
that would be good :-) It's still mentioned in the 13.4 documentation, though 🤔 |
|
Let's see what is the impact of removing |
'-fast' at link time of an executable (e.g., cmsRun) will include initialization code that changes the FP state, but from the gcc13 release notes
|
I'm not absolutely sure, but from my reading I believe what the description means is that it can introduce data races for programs that assume a strongly-ordered memory model. It should be ok for programs that conform to the C++ memory model with appropriate use of atomics, and it may allow some vectorization optimizations that would otherwise be forbidden. |
|
FWIW, vectorization of trig functions has been important to mkFit timing performance. The most recent versions of mkFit accomplish this vectorization through VDT, which is a header-only library. The latest version of VDT (0.4.6, April 2025) changed the preferred optimization flags from -O3 -ffast-math to just -O3. Therefore I wonder if @dpiparo has any insights on this. |
I guess we should update VDT anyhow Line 1 in 95e4f6a
|
|
Single-threaded performance tests with mkFit standalone (VDT included) confirms that deleting If the Note that the suboptions of
Finally, according to gcc-13 documentation, the option |
|
Ah, thanks, I wrongly put |
-Ofast implies -funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. -Ofast enables (directly or indirectly): - ❌ -fallow-store-data-races - ✔ -fno-semantic-interposition - ✔ -fno-math-errno - ❌ -funsafe-math-optimizations - ❌ -ffinite-math-only - ✔ -fno-rounding-math (_default_) - ✔ -fno-signaling-nans (_default_) - ❔ -fcx-limited-range - ✔ -fexcess-precision=fast - ✔ -fno-signed-zeros - ✔ -fno-trapping-math - ✔ -fassociative-math - ❌ -freciprocal-math (disabled in cms-sw#8280) We should not use -fallow-store-data-races, -ffinite-math-only. We should not use -funsafe-math-optimizations at link time, as it affects the global FP state. It's probably easier to replace it with the individual options -fno-signed-zeros, -fno-trapping-math and -fassociative-math. We may revisit -fcx-limited-range if we do use Complex arithmetics. -freciprocal-math was already disabled explicitly, see cms-sw#8280.
6a383ac to
70c5c86
Compare
|
please test |
|
Pull request #10267 was updated. |
|
OK, yes, browsing through the GCC code, |
|
@srlantz how does |
|
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-919f35/50589/summary.html Failed External BuildI found compilation warning when building: See details on the summary page. |
About 20% slower, similar to omitting |
|
@fwyzard note also that we generally include |
Sure, but that's already implied by |
Pity, because |
Yeah, |
|
-1 Failed Tests: RelVals RelVals-NVIDIA_H100 nvidia_l40sUnitTests Failed RelValsFailed RelVals-NVIDIA_H100
|
-Ofastimplies-funsafe-math-optimizations, which affects the global FP state for all programs: when used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimisations.-Ofastenables, directly or indirectly:-fallow-store-data-races-fno-semantic-interposition-fno-math-errno-funsafe-math-optimizations-ffinite-math-only-fno-rounding-math(default)-fno-signaling-nans(default)-fcx-limited-range-fexcess-precision=fast-fno-signed-zeros-fno-trapping-math-fassociative-math-freciprocal-math(Add -fno-reciprocal-math -mrecip=none to Ofast to solve the AMD/Intel differences #8280)We should not use
-fallow-store-data-races,-ffinite-math-only.We should not use
-funsafe-math-optimizationsat link time, as it affects the global FP state. It's probably easier to replace it with the individual options-fno-signed-zeros,-fno-trapping-mathand-fassociative-math.We may revisit
-fcx-limited-rangeif we do use Complex arithmetics.-freciprocal-mathwas already disabled explicitly, see #8280.