[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

whitneywhtsang · 2023-08-21T06:32:04Z

Please only review 80301a6 and 2fee8ea and files with conflict:

        both modified:   llvm-spirv/tools/llvm-spirv/llvm-spirv.cpp
        both modified:   mlir/include/mlir/Conversion/Passes.td
        both modified:   mlir/include/mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h
        both modified:   mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp
        both modified:   mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
        both modified:   mlir/test/Target/LLVMIR/Import/intrinsic.ll

A number of commits are reverted due to opaque pointer, track in #8616.

f39c399 [Driver] -###: exit with code 1 if hasErrorOccurred, this commit causes ninja check-clang-driver tests to fail, XFAIL them temporarily, track in #10932.

Please do not squash and merge this PR.

This allows using VPRecipeWithIRFlags for VPInstruction and reduces the diff for D157144 & D157194.

Currently we only consider basic blocks with a unique predecessor when estimating the size of dead code. However, we could expand to this to consider blocks with a back-edge, or blocks preceded by dead blocks. Differential Revision: https://reviews.llvm.org/D156903

This ports over the test cases half-convert.ll and implements patterns or RISCVISelLowering.cpp changes for all of the most straight-forward cases (those that don't require changes outside of lib/Target/RISCV). The remaining cases and noted poor codegen for saturating conversions will be handled in follow-up patches. Differential Revision: https://reviews.llvm.org/D156943

This patch moves directive sets defined internally in Semantics to a header accessible by other stages of the compiler to enable reuse. Some sets are renamed/rearranged and others are lifted from local definitions to provide a single source of truth. Differential Revision: https://reviews.llvm.org/D157090

…nversion Extending to f32 first (as is done for f16) results in better generated code for RISC-V (and affects no other in-tree tests). Additionally, performing the FP_EXTEND first seems equally justified for bf16 as for f16. Differential Revision: https://reviews.llvm.org/D156944

In most of testcases, it usually has a blank line after end of RUN lines for readability.

This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099

The affected lit tests failed when they were run in a path that contained the word "call". CHECK-NOT lines that were supposed to match only the IR ended up matching the path printed in the output. Fixed this by checking for "call void" instead.

Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D152141

Split off suggested refactoring from D157144. Also adds a assert to make sure this is only used when OpType is FPMathOp.

6640df9 did not actually remove it, just its final user. cannotBeOrderedLessThanZeroImpl still has a user which needs to be updated before it can be removed. The users of SignBitMustBeZero currently have broken expectations for nan handling, so requires more work to replace.

Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D157214

Currently `isTriviallyReMaterializable` calls `isReallyTriviallyReMaterializable` and `isReallyTriviallyReMaterializableGeneric`. The two interfaces are confusing, but there are also some real issues with this. The documentation of this function (see below) suggests that `isReallyTriviallyRematerializable` allows the target to override the default behaviour. /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. It however implements something different. The default behaviour is the analysis done in `isReallyTriviallyReMaterializableGeneric`, which is testing if it is safe to rematerialize the MachineInstr. The result of `isReallyTriviallyReMaterializable` is only considered if `isReallyTriviallyReMaterializableGeneric` returns `false`. That means there is no way to override the default behaviour if `isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to rematerialize, but we'd rather not). By making this a single interface, we can override the interface to do either. Reviewed By: craig.topper, nemanjai Differential Revision: https://reviews.llvm.org/D156520

…s for PACKSS/PACKUS Begin to consolidate the similar matching code we have - all have semi-similar constraints that still need merging together to ensure we get consistent codegen depending on when the truncate is lowered.

…vector types Fuzz testing noticed that the sub-128-bit vector splitting added in ef4330f didn't correctly halt at <2 x iXX> truncations.

As requested in review for https://reviews.llvm.org/D156990 This additionally consistently uses the ilp32d/lp64d ABIs when the D extension is enabled.

redundant get() call on smart pointer.

Support IR that is generated by the vector-to-scf lowering of 2D vector transfers with a mask. Only 2D transfers that were fully unrolled are supported at the moment. Differential Revision: https://reviews.llvm.org/D156695

Use APInt to represent numeric variables and expressions, therefore removing overflow concerns. Only remains underflow when the format of an expression is unsigned (incl. hex values) but the result is negative. Note that this can only happen when substituting an expression, not when capturing since the regex used to capture unsigned value will not include minus sign, hence all the code removal for match propagation testing. This is what this patch implement. Reviewed By: arichardson Differential Revision: https://reviews.llvm.org/D150880

Add Matcher dependentSizedExtVectorType for DependentSizedExtVectorType. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D157237

Add Matcher convertVectorExpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D157248

…me from CF_OPTIONS This cherry-picks swiftlang/llvm-project#6431 since without it, macOS 14 SDK headers don't compile when targeting catalyst. Fixes #64438.

Similar to the other ValueTracking function, switch over the instruction opcode instead of doing a long sequence of match()es.

CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp

…f a live register This patch tweaks the fix in D20627 "Do not rename registers that do not start an independent live range" to only consider Data dependencies, not Output or Anti dependencies. An Output or Anti dependency to a superreg does not imply that that superreg is live at the current instruction. This enables breaking anti-dependencies in a few more cases as shown by the lit test updates. Differential Revision: https://reviews.llvm.org/D156879

…ister This patch reworks the fix from D20627 "Do not rename registers that do not start an independent live range". That fix depended on the scheduler dependency graph having redundant edges. Those edges are removed by D156552 "[MachineScheduler] Track physical register dependencies per-regunit" with the result that on several Hexagon lit tests, the post-RA scheduler would schedule the code in a way that fails machine verification. Consider this code where D11 is a pair R23:R22: SU(0): %R2<def> = A2_add %R23, %R17<kill> (anti dependency on R23 here) SU(8): %R23<def> = S2_asr_i_r %R22, 31 (data dependency on R23->D11 here) SU(10): %D0<def> = A2_tfrp %D11<kill> The original fix would detect this situation by examining the dependency from SU(8) to SU(10) and seeing that D11 is not a subreg of R23. A slightly more complicated example: SU(0): %R2<def> = A2_add %R23, %R17<kill> (anti dependency on R23 here) SU(8): %R23<def> = S2_asr_i_r %R22, 31 (data dependency on R23 here) SU(9): %R23<def> = S2_asr_i_r %R23, 31 (data dependency on R23->D11 here) SU(10): %D0<def> = A2_tfrp %D11<kill> The original fix also worked on this example, but only because ScheduleDAGInstrs adds an extra data dependency edge directly from SU(8) to SU(10). This edge is redundant, since you could infer it transitively from the edges SU(8)->SU(9) and SU(9)->SU(10), and since none of the data that SU(8) writes to R23 is read by SU(10). After D156552 the redundant edge SU(8)->SU(10) will not be present, so when we examine the successors of SU(8) we will not find any that read from a superreg of R23. This patch removes the original fix from D20627, which examined edges in the dependency graph. Instead it extends a check that was already being done in FindSuitableFreeRegisters: instead of checking that *some* register is a superreg of all registers in the rename group, we now check that the specific register that carries the anti-dependency that we want to break is a superreg of all registers in the rename group. Differential Revision: https://reviews.llvm.org/D156880

Change the scheduler's physical register dependency tracking from registers-and-their-aliases to regunits. This has a couple of advantages when subregisters are used: - The dependency tracking is more accurate and creates fewer useless edges in the dependency graph. An AMDGPU example, edited for clarity: SU(0): $vgpr1 = V_MOV_B32 $sgpr0 SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1 SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0 There is a data dependency on $vgpr1 from SU(0) to SU(1) and from SU(1) to SU(2). But the old dependency tracking code also added a useless edge from SU(0) to SU(2) because it thought that SU(0)'s def of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1. - On targets like AMDGPU that make heavy use of subregisters, each register can have a huge number of aliases - it can be quadratic in the size of the largest defined register tuple. There is a much lower bound on the number of regunits per register, so iterating over regunits is faster than iterating over aliases. The LLVM compile-time tracker shows a tiny overall improvement of 0.03% on X86. I expect a larger compile-time improvement on targets like AMDGPU. Recommit after fixing AggressiveAntiDepBreaker in D156880. Differential Revision: https://reviews.llvm.org/D156552

This reverts commit 18439cf.

This reverts commit 243f056.

This reverts commit 6b83c06.

This reverts commit 474ec69.

Signed-off-by: Tsang, Whitney <[email protected]>

fhahn and others added 30 commits August 7, 2023 11:03

[VPlan] Move up VPRecipeWithIRFlags definition. (NFC)

7b14c05

This allows using VPRecipeWithIRFlags for VPInstruction and reduces the diff for D157144 & D157194.

[RISCV] Add a blank line after end of RUN lines. NFC.

f2bdc29

In most of testcases, it usually has a blank line after end of RUN lines for readability.

[AMDGPU] Add and use SIInstrFlags::GWS. NFC.

e61ca23

This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099

[NFC] strengthen some CHECK-NOT lines

f7031c4

The affected lit tests failed when they were run in a path that contained the word "call". CHECK-NOT lines that were supposed to match only the IR ended up matching the path printed in the output. Fixed this by checking for "call void" instead.

[Clang] Make __arm_streaming apply only to prototyped functions.

4d3e917

Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D152141

[VPlan] Move VPRecipeWithIRFlags::getFastMathFlags. (NFCI)

0b17e9d

Split off suggested refactoring from D157144. Also adds a assert to make sure this is only used when OpType is FPMathOp.

[OpenCL] Fix grammar in test error messages; NFC

7f00389

[lldb] Fix typo in comments and in test

aa27848

Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D157214

[Clang][OpenMP] Support for Code Generation of loop bind clause

4097a24

[X86] truncateVectorWithPACK - ensure we don't truncate to <1 x iXX> …

0d1f853

…vector types Fuzz testing noticed that the sub-128-bit vector splitting added in ef4330f didn't correctly halt at <2 x iXX> truncations.

[RISCV][test] Add non-zfbfmin RUN lines to bfloat-convert.ll

380fd82

As requested in review for https://reviews.llvm.org/D156990 This additionally consistently uses the ilp32d/lp64d ABIs when the D extension is enabled.

[mlir] Apply ClangTidy fix (NFC)

7d6fb14

redundant get() call on smart pointer.

[clang][ASTMatcher] Add Matcher 'dependentSizedExtVectorType'

4cce27d

Add Matcher dependentSizedExtVectorType for DependentSizedExtVectorType. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D157237

[clang][ASTMatcher] Add Matcher 'convertVectorExpr'

8baf862

Add Matcher convertVectorExpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D157248

[clang/cxx-interop] Teach clang to ignore availability errors that co…

bb58748

…me from CF_OPTIONS This cherry-picks swiftlang/llvm-project#6431 since without it, macOS 14 SDK headers don't compile when targeting catalyst. Fixes #64438.

[NFC][SCCP] Regenerate test case

03dec91

[ValueTracking] Switch over opcode in isKnownToBeAPowerOfTwo() (NFC)

8aeb84c

Similar to the other ValueTracking function, switch over the instruction opcode instead of doing a long sequence of match()es.

Merge from 'main' to 'sycl-web' (30 commits)

2c37701

CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp

whitneywhtsang added 3 commits August 21, 2023 09:38

Revert "[mlir][LLVMIR] Fix identified structs with same name"

e6ebeeb

This reverts commit 18439cf.

Revert "[llvm] Replace uses of Type::getPointerTo (NFC)"

ca951ad

This reverts commit 243f056.

Revert "[ArgPromotion] Remove code for handling typed pointers (NFC)"

baf7e83

This reverts commit 6b83c06.

whitneywhtsang force-pushed the merge branch 3 times, most recently from 865f284 to c5344b7 Compare August 22, 2023 00:15

whitneywhtsang added 2 commits August 21, 2023 20:41

Revert "[clang] Replace uses of CGBuilderTy::CreateElementBitCast (NFC)"

eaef803

This reverts commit 474ec69.

[SYCL-MLIR] Temporarily XFAIL clang driver tests

39a93dd

Signed-off-by: Tsang, Whitney <[email protected]>

whitneywhtsang force-pushed the merge branch 13 times, most recently from 979d22e to 1f587e5 Compare August 22, 2023 18:31

whitneywhtsang added 2 commits August 22, 2023 20:36

[SYCL-MLIR][CI] Temporarily disable some build and test

2fee8ea

Signed-off-by: Tsang, Whitney <[email protected]>

[SYCL-MLIR] Fix merge

80301a6

Signed-off-by: Tsang, Whitney <[email protected]>

whitneywhtsang force-pushed the merge branch from 1f587e5 to 80301a6 Compare August 23, 2023 03:36

This was referenced Aug 23, 2023

[SYCL-MLIR] Investigate clang-driver lit failures #10932

Closed

[SYCL-MLIR] Opaque pointer handling #8616

Closed

[SYCL-MLIR] Investigate constant-propagation.mlir failure with an assertion in DenseAnalysis #10935

Closed

whitneywhtsang closed this Aug 24, 2023

whitneywhtsang deleted the merge branch August 28, 2023 12:21

whitneywhtsang restored the merge branch August 28, 2023 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

Uh oh!

whitneywhtsang commented Aug 21, 2023 •

edited

Loading

Uh oh!

Uh oh!

[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

Uh oh!

Conversation

whitneywhtsang commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

whitneywhtsang commented Aug 21, 2023 •

edited

Loading