-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL-MLIR] Merge from intel/llvm sycl branch #8600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We currently don't handle tail calls in fast-isel but we continue with the lowering when -mlongcall is specified and lower the calls normally. We should defer to SDISel for this so that it is lowered correctly. Differential revision: https://reviews.llvm.org/D123997
Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D139452
This came out of from https://discourse.llvm.org/t/dwarf-dwp-4gb-limit/63902 With big binaries we can have .dwp files where .debug_info.dwo section can grow beyond 4GB. We would like to support this in LLVM and in LLDB. The plan is to enable manual parsing of cu/tu index in DWARF library (https://reviews.llvm.org/D137882), and then switch internal index data structure to 64 bit. For the second part is to enable 64bit offset support in LLDB with this patch. Depends on D139955 Reviewed By: labath Differential Revision: https://reviews.llvm.org/D138618
…on non-annotated tensor. No need for a temp COO and sort even when converting dense -> CSC, we can instead rotate the loop to yield a ordered coordinates at beginning. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D144213
This pass isn't a simplification, it's a non-canonical optimization. This makes it only run once in a (Thin)LTO pipeline during postlink, just like all the other optimization pipeline passes. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D143424
Reviewed By: TIFitis Differential Revision: https://reviews.llvm.org/D143964
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D144231
…indexed load/store with constant offset. Unlike ARM target, current AArch64 target doesn't have facility to encode the operation bit: whether to add an offset to base pointer for pre-inc/post-inc addressing mode, or to subtract an offset from base pointer for pre-dec/post-dec addressing mode. A mis-compile (llvm/llvm-project#60645) was noticed due to this limitation. Therefore, for AArch64 auto-indexed load/store with constant offset, always use pre-inc/post-inc addressing mode. The constant offset is negated for pre-dec/post-dec addressing mode. An auto-indexed address with non-constant offset is currently not split into base and offset parts. If we are to handle non-constant offset in the future, offset node will need to take a negate. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D143796
If the value from constant-pool is a splat value of vector type, do not need swap after load from constant-pool. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D139491
…l edit descriptor My earlier misreading of the Fortran standards had convinced me that child I/O -- meaning the use of user-defined subroutines via generic interfaces to implement data transfer statements -- was not allowed to advance the current record in the ultimate unit of the original (non-child parent) data transfer statement. This turns out to be wrong, so forward AdvanceRecord() from ChildFormattedIoStatement<> to its parent I/O statement rather than implementing it as a no-op. Differential Revision: https://reviews.llvm.org/D144205
…iar behavior. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D144236
According to github.com/WebAssembly/simd/blob/main/proposals/simd/SIMD.md, the shift count of bit shift instructions is taken modulo lane width. This patch adds such operation. Fixes PR#60655 Differential Revision: https://reviews.llvm.org/D144169
This reverts commit 2062e90.
…(NFC) /data/llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionParser.cpp:398:34: error: variable 'lang_rt' set but not used [-Werror,-Wunused-but-set-variable] lldb_private::LanguageRuntime *lang_rt = nullptr; ^ 1 error generated.
When we simplify loads we need to adjust types (esp. null-values) properly to avoid inconsinstencies down the line. Add a cast and an error message. Fixes: llvm/llvm-project#60788
Before this patch ``` $ echo "leal (,%r15), %eax" | llvm-mc --show-encoding --output-asm-variant=1 lea eax, [r15] # encoding: [0x42,0x8d,0x04,0x3d,0x00,0x00,0x00,0x00] $ echo "lea eax, [r15]" | llvm-mc --show-encoding -x86-asm-syntax=intel --output-asm-variant=1 lea eax, [r15] # encoding: [0x41,0x8d,0x07] ``` MC printed the register r15 as a base in intel syntax even when it's an index. Then we got a different encoding by using the assembly from the output of the first command. I believe the behavior is too weird to be called a feature. After this patch, we get ``` $ echo "leal (,%r15), %eax" | llvm-mc --show-encoding --output-asm-variant=1 lea eax, [1*r15] # encoding: [0x42,0x8d,0x04,0x3d,0x00,0x00,0x00,0x00] ``` Reviewed By: RKSimon, pengfei, MaskRay Differential Revision: https://reviews.llvm.org/D144183
If we have a CallBrInst with output that's used, we need to split critical edges so that we have some place to insert COPYs for physregs to virtregs. Part 2a of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Test cases and logic re-purposed from D138078. Reviewed By: efriedma, void, jyknight Differential Revision: https://reviews.llvm.org/D139872
This will be necessary to support outputs from asm goto along indirect edges. Test via: $ pushd llvm/build; ninja IRTests; popd $ ./llvm/build/unittests/IR/IRTests \ --gtest_filter=DominatorTree.CallBrDomination Also, return nullptr in Instruction::getInsertionPointAfterDef for CallBrInst as was recommened in https://reviews.llvm.org/D135997#3991427. The following phab review was folded into this commit: https://reviews.llvm.org/D140166 Link: Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Reviewed By: void, efriedma, ChuanqiXu, MaskRay Differential Revision: https://reviews.llvm.org/D135997
Insert a new intrinsic call after splitting critical edges, and verify it. Later commits will update the SSA values to use this new value along indirect branches rather than the callbr's value, and have SelectionDAG consume this new value. Part 2b of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Reviewed By: efriedma, jyknight Differential Revision: https://reviews.llvm.org/D139883
Because this pass is to be a codegen pass, it must use the legacy pass manager. Link: https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Reviewed By: aeubanks, void Differential Revision: https://reviews.llvm.org/D139861
Now that we've inserted a call to an intrinsic, we need to update certain previous uses of CallBrInst values to use the value of this intrinsic instead. There are 3 cases to handle: 1. The @llvm.callbr.landingpad.<type>() intrinsic call is in the same BasicBlock as the use of the callbr we're replacing. 2. The use is dominated by the direct destination. 3. The use is not dominated by the direct destination, and may or may not be dominated by the indirect destination. Part 2c of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8. Reviewed By: efriedma, void, jyknight Differential Revision: https://reviews.llvm.org/D139970
Given a CallBrInst, retain its first virtual register in SelectionDagBuilder's FunctionLoweringInfo if there's corresponding landingpad. Walk the list of COPY MachineInstr to find the original virtual and physical registers defined by the INLINEASM_BR MachineInst. Test cases from https://reviews.llvm.org/D139565. Link: llvm/llvm-project#59538 Part 3 from https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Follow up patches still need to wire up CallBrPrepare into the pass pipelines. Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D140160
Capstone of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Clang changes are still necessary to enable the use of outputs along indirect edges of asm goto statements. Link: llvm/llvm-project#53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D140180
Prerequisite to further modifications in D136497. Basically, there is a large body of code in CodeGenFunction::EmitAsmStmt for emitting stores of outputs. We want to be able to repeat this logic, for each destination of a callbr (rather than just the default destination which is what the code currently does). Also does some smaller cleanups like whitespace cleanups, and removing pointless casts. Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D137113
Initial support for asm goto w/ outputs (D69876) only supported outputs along the "default" (aka "fallthrough") edge. We can support outputs along all edges by repeating the same pattern of stores along the indirect edges that we allready do for the default edge. One complication is that these indirect edges may be critical edges which would need to be split. Another issue is that mid-codgen of LLVM IR, the control flow graph might not reflect the control flow of the final function. To avoid this "chicken and the egg" problem assume that any given indirect edge may become a critical edge, and pro-actively split it. This is unnecessary if the edge does not become critical, but LLVM will optimize such cases via tail duplication. Fixes: llvm/llvm-project#53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D136497
Now that we support outputs from asm goto along indirect edges, we can remove/revert some code that was added to help warn about the previous limitation that outputs were not supported along indirect edges. Reverts some code added in: commit 72aa619 ("Warn of uninitialized variables on asm goto's indirect branch") commit 3a604fd ("[Clang][CFG] check children statements of asm goto") But keeps+updates the tests. Link: llvm/llvm-project#53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D140508
Also move the line about __has_extension(gnu_asm_goto_with_outputs) so that it is more generally about asm goto, not the paragraph on symbolic references. Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D143205
…#8519) Fixes intel#8326. Signed-off-by: Maronas, Marcos <[email protected]>
When it is failed to build a program, L0 throws an exception, which causes sycl::program to fail to create. Because of this, RT won't call piProgramRelease(), which leads to memory leaks. The solution is to release ZeModule created by plugin when the program build is failed. --------- Signed-off-by: Byoungro So <[email protected]>
This commit changes target::host_buffer deprecation warning to correctly refer to host_accessor as the replacement. Additionally it gives target::host_task an explicit value, similar to the other enumerators. Signed-off-by: Larsen, Steffen <[email protected]>
The new lsc_slm_block_load() has an additional operand 'old_values' that contains the values returned from the function if the predicate passed to it is 0. The corresponding LIT test: intel/llvm-test-suite#1637 Signed-off-by: Vyacheslav N Klochkov <[email protected]>
Currently we see a problem with llvm-test-suite checks on CUDA in AWS cloud, which looks like AWS was never allocated to execute the job. Current hypothesis is that we have a matrix job spawning 6 jobs, but we limit number of runners working in parallel to 4. This means that llvm-test-suite check on CUDA might wait for 1 or 2 llvm-test-suite checks on other machines to complete before we can run llvm-test-suite on CUDA machine. This strategy conflicts with the rules established for AWS machines, which waits for 1 hour after machine is allocated to start the job. If job has not arrived, machine returns to AWS. Since AWS allocation is speicific to each workflow run, once machine is returned to the AWS, GitHub is not able to use it anymore. GitHub waits for 24 hours and kills the workflow due to missing runner. This change removes the limitation for the number of runners we can use in parallel to run llvm-test-suite. This should fix the problem with missing results on CUDA machines.
LLVM: llvm/llvm-project@890146b SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@4af277c
This change fixes two issues reported by static analyzer: 1. Uninitialized pointer field 2. AUTO_CAUSES_COPY Signed-off-by: Arvind Sudarsanam <[email protected]>
This reverts commit 02f9cf9.
Signed-off-by: Tsang, Whitney <[email protected]>
Looks good overall, just one question: What's the reason to switch from |
By looking at the docs, this change seems to be correct. However, I don't see at which point did this change. Note that this will be obsolete when we switch to defining these attributes as enum attributes (builtin parser). I have that in the pipeline. |
LGTM. Now we have more tests passing. Thanks @whitneywhtsang ! |
Guys I have asked the MLIR folks whether they are willing to revert their breaking change. See https://reviews.llvm.org/rG02f9cf9266c35795aa44ded0c37f0c44d8c49069 and subscribe to the issue to follow the conversation (and weight in). |
I get that this would avoid some work right now, but the switch probably is inevitable, right? Currently, status is:
IIUC, typed pointers will defacto be unsupported with LLVM 17, so it might be a matter of weeks/months before that happens. |
Yes I actually missed that LLVM 17 (in principle) no longer allows typed pointers. So we (and the DPC++ compiler as well) will need to switch. |
Created #8616 to handle opaque pointers. |
Please only review 1e90026, and
Reverted 02f9cf9, as it forces to use opaque pointers.
Will remove
Plugin/interop-opencl-make-kernel-bundle.cpp
andBasic/gpu_max_wgs_error.cpp
from xfail.Please do not squash and merge this PR.