Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #10895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5,634 commits into from

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Aug 21, 2023

Please only review 80301a6 and 2fee8ea and files with conflict:

        both modified:   llvm-spirv/tools/llvm-spirv/llvm-spirv.cpp
        both modified:   mlir/include/mlir/Conversion/Passes.td
        both modified:   mlir/include/mlir/Dialect/SPIRV/Transforms/SPIRVConversion.h
        both modified:   mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp
        both modified:   mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
        both modified:   mlir/test/Target/LLVMIR/Import/intrinsic.ll

A number of commits are reverted due to opaque pointer, track in #8616.

f39c399 [Driver] -###: exit with code 1 if hasErrorOccurred, this commit causes ninja check-clang-driver tests to fail, XFAIL them temporarily, track in #10932.

Please do not squash and merge this PR.

fhahn and others added 30 commits August 7, 2023 11:03
This allows using VPRecipeWithIRFlags for VPInstruction and reduces the
diff for D157144 & D157194.
Currently we only consider basic blocks with a unique predecessor when
estimating the size of dead code. However, we could expand to this to
consider blocks with a back-edge, or blocks preceded by dead blocks.

Differential Revision: https://reviews.llvm.org/D156903
This ports over the test cases half-convert.ll and implements patterns
or RISCVISelLowering.cpp changes for all of the most straight-forward
cases (those that don't require changes outside of lib/Target/RISCV).
The remaining cases and noted poor codegen for saturating conversions
will be handled in follow-up patches.

Differential Revision: https://reviews.llvm.org/D156943
This patch moves directive sets defined internally in Semantics to a header
accessible by other stages of the compiler to enable reuse. Some sets are
renamed/rearranged and others are lifted from local definitions to provide
a single source of truth.

Differential Revision: https://reviews.llvm.org/D157090
…nversion

Extending to f32 first (as is done for f16) results in better generated
code for RISC-V (and affects no other in-tree tests). Additionally,
performing the FP_EXTEND first seems equally justified for bf16 as for
f16.

Differential Revision: https://reviews.llvm.org/D156944
In most of testcases, it usually has a blank line after end of RUN lines for readability.
This reduces the number of places where we have to check for a list of
DS_GWS_* opcodes.

Differential Revision: https://reviews.llvm.org/D157099
The affected lit tests failed when they were run in a path that contained the
word "call". CHECK-NOT lines that were supposed to match only the IR ended up
matching the path printed in the output. Fixed this by checking for "call void"
instead.
Split off suggested refactoring from D157144. Also adds a assert to make
sure this is only used when OpType is FPMathOp.
6640df9 did not actually remove it,
just its final user. cannotBeOrderedLessThanZeroImpl still has a user
which needs to be updated before it can be removed.

The users of SignBitMustBeZero currently have broken expectations for
nan handling, so requires more work to replace.
Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D157214
Currently `isTriviallyReMaterializable` calls
`isReallyTriviallyReMaterializable` and
`isReallyTriviallyReMaterializableGeneric`. The two interfaces
are confusing, but there are also some real issues with this.

The documentation of this function (see below) suggests that
`isReallyTriviallyRematerializable` allows the target to override the
default behaviour.

  /// For instructions with opcodes for which the M_REMATERIALIZABLE flag is
  /// set, this hook lets the target specify whether the instruction is actually
  /// trivially rematerializable, taking into consideration its operands.

It however implements something different. The default behaviour
is the analysis done in `isReallyTriviallyReMaterializableGeneric`,
which is testing if it is safe to rematerialize the MachineInstr.

The result of `isReallyTriviallyReMaterializable` is only considered if
`isReallyTriviallyReMaterializableGeneric` returns `false`.  That means
there is no way to override the default behaviour if
`isReallyTriviallyReMaterializableGeneric` returns true (i.e. it is safe to
rematerialize, but we'd rather not).

By making this a single interface, we can override the interface to do either.

Reviewed By: craig.topper, nemanjai

Differential Revision: https://reviews.llvm.org/D156520
…s for PACKSS/PACKUS

Begin to consolidate the similar matching code we have - all have semi-similar constraints that still need merging together to ensure we get consistent codegen depending on when the truncate is lowered.
…vector types

Fuzz testing noticed that the sub-128-bit vector splitting added in ef4330f didn't correctly halt at <2 x iXX> truncations.
As requested in review for https://reviews.llvm.org/D156990

This additionally consistently uses the ilp32d/lp64d ABIs when the D
extension is enabled.
redundant get() call on smart pointer.
Support IR that is generated by the vector-to-scf lowering of 2D vector transfers with a mask. Only 2D transfers that were fully unrolled are supported at the moment.

Differential Revision: https://reviews.llvm.org/D156695
Use APInt to represent numeric variables and expressions, therefore
removing overflow concerns. Only remains underflow when the format of an
expression is unsigned (incl. hex values) but the result is negative.
Note that this can only happen when substituting an expression, not when
capturing since the regex used to capture unsigned value will not include
minus sign, hence all the code removal for match propagation testing.
This is what this patch implement.

Reviewed By: arichardson

Differential Revision: https://reviews.llvm.org/D150880
Add Matcher dependentSizedExtVectorType for DependentSizedExtVectorType.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D157237
Add Matcher convertVectorExpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D157248
…me from CF_OPTIONS

This cherry-picks swiftlang/llvm-project#6431
since without it, macOS 14 SDK headers don't compile when targeting
catalyst.

Fixes #64438.
Similar to the other ValueTracking function, switch over the
instruction opcode instead of doing a long sequence of match()es.
  CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
…f a live register

This patch tweaks the fix in D20627 "Do not rename registers that do not
start an independent live range" to only consider Data dependencies, not
Output or Anti dependencies. An Output or Anti dependency to a superreg
does not imply that that superreg is live at the current instruction.

This enables breaking anti-dependencies in a few more cases as shown by
the lit test updates.

Differential Revision: https://reviews.llvm.org/D156879
…ister

This patch reworks the fix from D20627 "Do not rename registers that do
not start an independent live range". That fix depended on the scheduler
dependency graph having redundant edges. Those edges are removed by
D156552 "[MachineScheduler] Track physical register dependencies
per-regunit" with the result that on several Hexagon lit tests, the
post-RA scheduler would schedule the code in a way that fails machine
verification.

Consider this code where D11 is a pair R23:R22:

SU(0): %R2<def> = A2_add %R23, %R17<kill>
    (anti dependency on R23 here)
SU(8): %R23<def> = S2_asr_i_r %R22, 31
    (data dependency on R23->D11 here)
SU(10): %D0<def> = A2_tfrp %D11<kill>

The original fix would detect this situation by examining the dependency
from SU(8) to SU(10) and seeing that D11 is not a subreg of R23.

A slightly more complicated example:

SU(0): %R2<def> = A2_add %R23, %R17<kill>
    (anti dependency on R23 here)
SU(8): %R23<def> = S2_asr_i_r %R22, 31
    (data dependency on R23 here)
SU(9): %R23<def> = S2_asr_i_r %R23, 31
    (data dependency on R23->D11 here)
SU(10): %D0<def> = A2_tfrp %D11<kill>

The original fix also worked on this example, but only because
ScheduleDAGInstrs adds an extra data dependency edge directly from SU(8)
to SU(10). This edge is redundant, since you could infer it transitively
from the edges SU(8)->SU(9) and SU(9)->SU(10), and since none of the
data that SU(8) writes to R23 is read by SU(10).

After D156552 the redundant edge SU(8)->SU(10) will not be present, so
when we examine the successors of SU(8) we will not find any that read
from a superreg of R23.

This patch removes the original fix from D20627, which examined edges in
the dependency graph. Instead it extends a check that was already being
done in FindSuitableFreeRegisters: instead of checking that *some*
register is a superreg of all registers in the rename group, we now
check that the specific register that carries the anti-dependency that
we want to break is a superreg of all registers in the rename group.

Differential Revision: https://reviews.llvm.org/D156880
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Recommit after fixing AggressiveAntiDepBreaker in D156880.

Differential Revision: https://reviews.llvm.org/D156552
@whitneywhtsang whitneywhtsang force-pushed the merge branch 3 times, most recently from 865f284 to c5344b7 Compare August 22, 2023 00:15
@whitneywhtsang whitneywhtsang force-pushed the merge branch 13 times, most recently from 979d22e to 1f587e5 Compare August 22, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.