Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #8600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3,609 commits into from
Mar 10, 2023

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Mar 9, 2023

Please only review 1e90026, and

CONFLICT (content): Merge conflict in mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
CONFLICT (content): Merge conflict in mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp
CONFLICT (content): Merge conflict in mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp

Reverted 02f9cf9, as it forces to use opaque pointers.

Will remove Plugin/interop-opencl-make-kernel-bundle.cpp and Basic/gpu_max_wgs_error.cpp from xfail.

Please do not squash and merge this PR.

nemanjai and others added 30 commits February 16, 2023 16:15
We currently don't handle tail calls in fast-isel but
we continue with the lowering when -mlongcall is
specified and lower the calls normally. We should
defer to SDISel for this so that it is lowered correctly.

Differential revision: https://reviews.llvm.org/D123997
This came out of from https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902
With big binaries we can have .dwp files where .debug_info.dwo section can grow
beyond 4GB. We would like to support this in LLVM and in LLDB.

The plan is to enable manual parsing of cu/tu index in DWARF library
(https://reviews.llvm.org/D137882), and then
switch internal index data structure to 64 bit.
For the second part is to enable 64bit offset support in LLDB with
this patch.

Depends on D139955

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D138618
…on non-annotated tensor.

No need for a temp COO and sort even when converting dense -> CSC, we can instead rotate the loop to yield a ordered coordinates at beginning.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D144213
This pass isn't a simplification, it's a non-canonical optimization.

This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D143424
…indexed load/store with constant offset.

Unlike ARM target, current AArch64 target doesn't have facility to encode the
operation bit: whether to add an offset to base pointer for pre-inc/post-inc
addressing mode, or to subtract an offset from base pointer for
pre-dec/post-dec addressing mode.

A mis-compile (llvm/llvm-project#60645) was noticed
due to this limitation.

Therefore, for AArch64 auto-indexed load/store with constant offset, always
use pre-inc/post-inc addressing mode. The constant offset is negated for
pre-dec/post-dec addressing mode.
An auto-indexed address with non-constant offset is currently not split into
base and offset parts. If we are to handle non-constant offset in the future,
offset node will need to take a negate.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D143796
If the value from constant-pool is a splat value of vector type, do not
need swap after load from constant-pool.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D139491
…l edit descriptor

My earlier misreading of the Fortran standards had convinced me that child I/O
-- meaning the use of user-defined subroutines via generic interfaces to implement
data transfer statements -- was not allowed to advance the current record in the
ultimate unit of the original (non-child parent) data transfer statement.
This turns out to be wrong, so forward AdvanceRecord() from ChildFormattedIoStatement<>
to its parent I/O statement rather than implementing it as a no-op.

Differential Revision: https://reviews.llvm.org/D144205
According to github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md,
the shift count of bit shift instructions is taken modulo lane width.
This patch adds such operation.

Fixes PR#60655

Differential Revision: https://reviews.llvm.org/D144169
…(NFC)

/data/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionParser.cpp:398:34: error: variable 'lang_rt' set but not used [-Werror,-Wunused-but-set-variable]
  lldb_private::LanguageRuntime *lang_rt = nullptr;
                                 ^
1 error generated.
When we simplify loads we need to adjust types (esp. null-values)
properly to avoid inconsinstencies down the line. Add a cast and an
error message.

Fixes: llvm/llvm-project#60788
Before this patch

```
$ echo "leal    (,%r15), %eax" | llvm-mc --show-encoding --output-asm-variant=1

        lea     eax, [r15]                      # encoding: [0x42,0x8d,0x04,0x3d,0x00,0x00,0x00,0x00]

$ echo "lea     eax, [r15]" | llvm-mc --show-encoding -x86-asm-syntax=intel --output-asm-variant=1

        lea     eax, [r15]                      # encoding: [0x41,0x8d,0x07]
```

MC printed the register r15 as a base in intel syntax even when it's an index.
Then we got a different encoding by using the assembly from the output of the
first command.

I believe the behavior is too weird to be called a feature.

After this patch, we get

```
$ echo "leal    (,%r15), %eax" | llvm-mc --show-encoding --output-asm-variant=1

        lea     eax, [1*r15]                    # encoding: [0x42,0x8d,0x04,0x3d,0x00,0x00,0x00,0x00]
```

Reviewed By: RKSimon, pengfei, MaskRay

Differential Revision: https://reviews.llvm.org/D144183
If we have a CallBrInst with output that's used, we need to split
critical edges so that we have some place to insert COPYs for physregs
to virtregs.

Part 2a of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Test cases and logic re-purposed from D138078.

Reviewed By: efriedma, void, jyknight

Differential Revision: https://reviews.llvm.org/D139872
This will be necessary to support outputs from asm goto along indirect
edges.

Test via:
  $ pushd llvm/build; ninja IRTests; popd
  $ ./llvm/build/unittests/IR/IRTests \
    --gtest_filter=DominatorTree.CallBrDomination

Also, return nullptr in Instruction::getInsertionPointAfterDef for
CallBrInst as was recommened in
https://reviews.llvm.org/D135997#3991427.  The following phab review was
folded into this commit: https://reviews.llvm.org/D140166

Link: Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Reviewed By: void, efriedma, ChuanqiXu, MaskRay

Differential Revision: https://reviews.llvm.org/D135997
Insert a new intrinsic call after splitting critical edges, and verify
it. Later commits will update the SSA values to use this new value along
indirect branches rather than the callbr's value, and have SelectionDAG
consume this new value.

Part 2b of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Reviewed By: efriedma, jyknight

Differential Revision: https://reviews.llvm.org/D139883
Because this pass is to be a codegen pass, it must use the legacy pass
manager.

Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Reviewed By: aeubanks, void

Differential Revision: https://reviews.llvm.org/D139861
Now that we've inserted a call to an intrinsic, we need to update
certain previous uses of CallBrInst values to use the value of this
intrinsic instead.

There are 3 cases to handle:
1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same
   BasicBlock as the use of the callbr we're replacing.
2. The use is dominated by the direct destination.
3. The use is not dominated by the direct destination, and may or may
   not be dominated by the indirect destination.

Part 2c of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8.

Reviewed By: efriedma, void, jyknight

Differential Revision: https://reviews.llvm.org/D139970
Given a CallBrInst, retain its first virtual register in SelectionDagBuilder's
FunctionLoweringInfo if there's corresponding landingpad. Walk the list
of COPY MachineInstr to find the original virtual and physical registers
defined by the INLINEASM_BR MachineInst.

Test cases from https://reviews.llvm.org/D139565.
Link: llvm/llvm-project#59538

Part 3 from
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Follow up patches still need to wire up CallBrPrepare into the pass
pipelines.

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D140160
Capstone of
https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8

Clang changes are still necessary to enable the use of outputs along
indirect edges of asm goto statements.

Link: llvm/llvm-project#53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140180
Prerequisite to further modifications in D136497.

Basically, there is a large body of code in CodeGenFunction::EmitAsmStmt
for emitting stores of outputs. We want to be able to repeat this logic,
for each destination of a callbr (rather than just the default
destination which is what the code currently does).

Also does some smaller cleanups like whitespace cleanups, and removing
pointless casts.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D137113
Initial support for asm goto w/ outputs (D69876) only supported outputs
along the "default" (aka "fallthrough") edge.

We can support outputs along all edges by repeating the same pattern of
stores along the indirect edges that we allready do for the default
edge.  One complication is that these indirect edges may be critical
edges which would need to be split. Another issue is that mid-codgen of
LLVM IR, the control flow graph might not reflect the control flow of
the final function.

To avoid this "chicken and the egg" problem assume that any given
indirect edge may become a critical edge, and pro-actively split it.
This is unnecessary if the edge does not become critical, but LLVM will
optimize such cases via tail duplication.

Fixes: llvm/llvm-project#53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D136497
Now that we support outputs from asm goto along indirect edges, we can
remove/revert some code that was added to help warn about the previous
limitation that outputs were not supported along indirect edges.

Reverts some code added in:
commit 72aa619 ("Warn of uninitialized variables on asm goto's indirect branch")
commit 3a604fd ("[Clang][CFG] check children statements of asm goto")
But keeps+updates the tests.

Link: llvm/llvm-project#53562

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D140508
Also move the line about __has_extension(gnu_asm_goto_with_outputs) so
that it is more generally about asm goto, not the paragraph on symbolic
references.

Reviewed By: efriedma, void

Differential Revision: https://reviews.llvm.org/D143205
maarquitos14 and others added 13 commits March 7, 2023 16:31
When it is failed to build a program, L0 throws an exception, which
causes sycl::program to fail to create.
Because of this, RT won't call piProgramRelease(), which leads to memory
leaks.
The solution is to release ZeModule created by plugin when the program
build is failed.

---------

Signed-off-by: Byoungro So <[email protected]>
This commit changes target::host_buffer deprecation warning to correctly
refer to host_accessor as the replacement. Additionally it gives
target::host_task an explicit value, similar to the other enumerators.

Signed-off-by: Larsen, Steffen <[email protected]>
The new lsc_slm_block_load() has an additional operand 'old_values' that
contains the values returned from the function if the predicate passed
to it is 0.

The corresponding LIT test: intel/llvm-test-suite#1637

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
Currently we see a problem with llvm-test-suite checks on CUDA in AWS
cloud, which looks like AWS was never allocated to execute the job.
Current hypothesis is that we have a matrix job spawning 6 jobs, but we
limit number of runners working in parallel to 4. This means that
llvm-test-suite check on CUDA might wait for 1 or 2 llvm-test-suite
checks on other machines to complete before we can run llvm-test-suite
on CUDA machine.
This strategy conflicts with the rules established for AWS machines,
which waits for 1 hour after machine is allocated to start the job. If
job has not arrived, machine returns to AWS. Since AWS allocation is
speicific to each workflow run, once machine is returned to the AWS,
GitHub is not able to use it anymore. GitHub waits for 24 hours and
kills the workflow due to missing runner.

This change removes the limitation for the number of runners we can use
in parallel to run llvm-test-suite. This should fix the problem with
missing results on CUDA machines.
This change fixes two issues reported by static analyzer:
1. Uninitialized pointer field
2. AUTO_CAUSES_COPY

Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Tsang, Whitney <[email protected]>
@whitneywhtsang whitneywhtsang added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Mar 9, 2023
@whitneywhtsang whitneywhtsang self-assigned this Mar 9, 2023
@whitneywhtsang whitneywhtsang requested a review from etiotto as a code owner March 9, 2023 23:34
@sommerlukas
Copy link
Contributor

Looks good overall, just one question: What's the reason to switch from FailureOr<MemoryAccessMode> to plain MemoryAccessNode in SYCLOpTypes.{h,cpp}? Some upstream API change?

@victor-eds
Copy link
Contributor

Looks good overall, just one question: What's the reason to switch from FailureOr<MemoryAccessMode> to plain MemoryAccessNode in SYCLOpTypes.{h,cpp}? Some upstream API change?

By looking at the docs, this change seems to be correct. However, I don't see at which point did this change.

Note that this will be obsolete when we switch to defining these attributes as enum attributes (builtin parser). I have that in the pipeline.

@etiotto
Copy link

etiotto commented Mar 10, 2023

LGTM. Now we have more tests passing. Thanks @whitneywhtsang !

@whitneywhtsang whitneywhtsang merged commit f8f6916 into intel:sycl-mlir Mar 10, 2023
@etiotto
Copy link

etiotto commented Mar 10, 2023

Guys I have asked the MLIR folks whether they are willing to revert their breaking change. See https://reviews.llvm.org/rG02f9cf9266c35795aa44ded0c37f0c44d8c49069 and subscribe to the issue to follow the conversation (and weight in).

@whitneywhtsang whitneywhtsang deleted the merge branch March 10, 2023 15:53
@sommerlukas
Copy link
Contributor

Guys I have asked the MLIR folks whether they are willing to revert their breaking change. See https://reviews.llvm.org/rG02f9cf9266c35795aa44ded0c37f0c44d8c49069 and subscribe to the issue to follow the conversation (and weight in).

I get that this would avoid some work right now, but the switch probably is inevitable, right?

Currently, status is:

Typed pointers are not supported on the main branch as a matter of policy. Fixes for typed pointer support are not accepted. Typed pointer support code may be removed without notice at any time.

IIUC, typed pointers will defacto be unsupported with LLVM 17, so it might be a matter of weeks/months before that happens.

@etiotto
Copy link

etiotto commented Mar 10, 2023

Guys I have asked the MLIR folks whether they are willing to revert their breaking change. See https://reviews.llvm.org/rG02f9cf9266c35795aa44ded0c37f0c44d8c49069 and subscribe to the issue to follow the conversation (and weight in).

I get that this would avoid some work right now, but the switch probably is inevitable, right?

Currently, status is:

Typed pointers are not supported on the main branch as a matter of policy. Fixes for typed pointer support are not accepted. Typed pointer support code may be removed without notice at any time.

IIUC, typed pointers will defacto be unsupported with LLVM 17, so it might be a matter of weeks/months before that happens.

Yes I actually missed that LLVM 17 (in principle) no longer allows typed pointers. So we (and the DPC++ compiler as well) will need to switch.

@whitneywhtsang
Copy link
Contributor Author

Created #8616 to handle opaque pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.