forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[AutoBump] Merge with 31249e27 (Jan 14) (42) #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lvm#122329) This fixes a conflict between the version of LLVM linked against by the runner and the unrelated version of LLVM that may be dynamically loaded by a graphics driver. (Relevant to llvm#73457: fixes loading certain Vulkan drivers.)
…d taskloop simd. (llvm#121746) Added codegen support for combined masked constructs `Parallel masked taskloop simd`. Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`. Co-authored-by: Chandra Ghale <[email protected]>
This patch adds support for processing the `host_eval` clause of `omp.target` to populate default and runtime kernel launch attributes. Specifically, these related to the `num_teams`, `thread_limit` and `num_threads` clauses attached to operations nested inside of `omp.target`. As a result, the `thread_limit` clause of `omp.target` is also supported. The implementation of `initTargetDefaultAttrs()` is intended to reflect clang's own processing of multiple constructs and clauses in order to define a default number of teams and threads to be used as kernel attributes and to populate global variables in the target device module. One side effect of this change is that it is no longer possible to translate to LLVM IR target device MLIR modules unless they have a supported target triple. This is because the local `getGridValue()` function in the `OpenMPIRBuilder` only works for certain architectures, and it is called whenever the maximum number of threads has not been explicitly defined. This limitation also matches clang. Evaluating the collapsed loop trip count of SPMD and Generic-SPMD kernels remains unsupported.
… the CONCAT is free (llvm#122750) Minor followup to llvm#122485 - if the source operands were widened half-size subvectors, then attempt to concatenate the subvectors directly, and then adjust the shuffle mask so references to the second operand now refer to the upper half of the concat result.
No difference in semantics here as byval is already handled separately. This simplifies migration to the captures attribute.
This was needed before llvm#115077 since the compiler-rt test build made assumptions about the build layout of libc++ and libc++abi, but now they link against a local installation of these libraries so we no longer need this workaround.
… runner (llvm#122329)" This reverts commit f879da7. The change to not export certain symbols apparently broke the UBsan/Asan buildbot, because the DSO for the sanitizer wants to link to them.
This patch adds support for lowering OpenMP clauses and expressions attached to constructs nested inside of a target region that need to be evaluated in the host device. This is done through the use of the `OpenMP_HostEvalClause` `omp.target` set of operands and entry block arguments. When lowering clauses for a target construct, a more involved `processHostEvalClauses()` function is called, which looks at the current and potentially other nested constructs in order to find and lower clauses that need to be processed outside of the `omp.target` operation under construction. This populates an instance of a global structure with the resulting MLIR values. The resulting list of host-evaluated values is used to initialize the `host_eval` operands when constructing the `omp.target` operation, and then replaced with the corresponding block arguments after creating that operation's region. Afterwards, while lowering nested operations, those that might potentially be evaluated on the host (i.e. `num_teams`, `thread_limit`, `num_threads` and `collapse`) check first whether there is an active global host-evaluated information structure and whether it holds values referring to these clauses. If that is the case, the stored values (`omp.target` entry block arguments at that stage) are used instead of lowering these clauses again.
The behavior is not entirely consistent with that of clang for the moment since detailed timing information on the LLVM IR optimization and code generation passes is not provided. The -ftime-report= option is also not enabled since that is only relevant for information about the LLVM IR passes. However, some code to handle that option has been included, to make it easier to support the option when the issues blocking it are resolved. A FortranSupport library has been created that is intended to mirror the LLVM and MLIR support libraries. Based on @tarunprabhu's PR llvm#107270 with minor changes addressing latest review feedback. He's busy and we'd like to get this support in ASAP. Co-authored-by: Tarun Prabhu <[email protected]>
Fix Fortran test failures caused by the introduction of the amdgcn-amd-amdhsa target triple in llvm#116052.
…2895) Even if their evaluation succeeds, mark them as invalid. This fixes some long standing differences to the ast walker interpeter.
…vm#122885) This fixes a miscompilation extracted from 525.x264_r, where we were failing to update the runtime VF of a VPReverseVectorPointerRecipe. We were removing a use of VF whilst iterating over the users() iterator, which messed up the iterator in-flight and caused us to miss some recipes. This fixes it by copying the users into a SmallVector first. Fixes llvm#122681 Fixes llvm#122682
The 2024-12 ISA spec release[1] add these features: FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product instructions and FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product instructions to allow these instructions to be available outside Armv9.6/sme2p2 [1] https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads
To be conservative, explicitly exclude byval arguments, which doesNotCapture() would otherwise allow. Even if byval has an initializes attribute, it would only apply to the implicit copy.
…lization link in release note (llvm#122910)
…llvm#122870) The figure includes works that's already committed. In does not include the WIP/RFC proposal in https://discourse.llvm.org/t/rfc-speeding-up-dwarf-indexing-again/83979.
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
This is intended for use with Arm's Guarded Control Stack extension (GCS). Which reuses some existing shadow stack support in Linux. It should also work with the x86 equivalent. A "ss" flag is added to the "VmFlags" line of shadow stack memory regions in `/proc/<pid>/smaps`. To keep the naming generic I've called it shadow stack instead of guarded control stack. Also the wording is "shadow stack: yes" because the shadow stack region is just where it's stored. It's enabled for the whole process or it isn't. As opposed to memory tagging which can be enabled per region, so "memory tagging: enabled" fits better for that. I've added a test case that is also intended to be the start of a set of tests for GCS. This should help me avoid duplicating the inline assembly needed. Note that no special compiler support is needed for the test. However, for the intial enabling of GCS (assuming the libc isn't doing it) we do need to use an inline assembly version of prctl. This is because as soon as you enable GCS, all returns are checked against the GCS. If the GCS is empty, the program will fault. In other words, you can never return from the function that enabled GCS, unless you push values onto it (which is possible but not needed here). So you cannot use the libc's prctl wrapper for this reason. You can use that wrapper for anything else, as we do to check if GCS is enabled.
Implementing `constexpr std::stable_sort`. This is part of P2562R1, tracked via issue llvm#105360. Closes llvm#119394 Co-authored-by: A. Jiang <[email protected]> Co-authored-by: Louis Dionne <[email protected]>
…lvm#122433) Currently, when the result type is 1-`tuple`, `tuple_cat` possibly tests an undesired constructor of the element, due to conversion from the reference tuple to the result type. If the element type has an unconstrained constructor template, there can be extraneous hard error which shouldn't happen. This patch introduces a helper function template to select the element-wise constructor template of `tuple`, which can avoid such error. Fixes llvm#41034.
Data entry operations which are created from constructs with async clause that has no value (aka `acc data copyin(var) async`) end up holding an attribute array named to keep track of this information. However, in cases where `async` clause is not used, calling `hasAsyncOnly` ends up crashing since this attribute is not set. Thus, to fix this issue, ensure that we check for this attribute before trying to walk the attribute array.
This PR fixes the ambiguities in name lookup caused by non-standard member typedefs `size_type` and `difference_type` in `std::bitset`. Follows up llvm#121620. Closes llvm#121618.
A regular expression was used in the lexing process. It made the program take more than linear time with regards to the length of the input. It looked like the entire buffer could be scanned for every token lexed. Now the regular expression is replaced with code. Previously it took 20 minutes for the program to format 125 000 lines of code on my computer. Now it takes 315 milliseconds.
…tern (llvm#122721) * Relocates two tests for `PadOpVectorizationWithTransferWritePattern` in "vectorization-pad-patterns.mlir" to group them with other tests for the same pattern. * Adds a note clarifying that these are negative tests and explains the reasoning behind them. * Removes `transform.apply_patterns.linalg.decompose_pad` from the TD sequences as it's no longer needed (*). This is essentially a small clean-up in preparation for upcoming changes. (*) `transform.apply_patterns.linalg.decompose_pad` was split off from `transform.apply_patterns.linalg.pad_vectorization` in llvm#117329. "vectorization-pad-patterns.mlir" is meant to test the latter, not the former.
…lvm#122292) This patch changes the codgegn for non-precise cosh calls to generate math.cosh ops. This wasn't done before because the math dialect did not have a cosh operation at the time.
According to one of the LLVM builds: llvm#122495 (comment) The linking to various "mlir::Pass::" methods is failing. Ensure dependency is properly setup.
…e_explicit_initialization]] (llvm#122947) This makes it consistent with `[[clang::require_constant_initialization]]`. (The attribute was just added to Clang a few minutes ago, so there are no users yet.)
…ds (llvm#122524) In LTO builds, some test checks can be optimized away, since the compiler can see through the memory accesses after inlining across TUs. This causes the existing death tests to fail, since the functions are completely optimized out and things like copying a lambda will no longer occur and trigger the sanitizer. To prevent that, we can use an empty inline assembly block to tell the compiler that memory is modified, and prevent it from doing that.
) 66badf2 (VT: teach a special-case optz about samesign) introduced a compile-time regression due to the use of CmpPredicate::getMatching, which is unnecessarily inefficient. Introduce CmpPredicate::getPreferredSignedPredicate, which alleviates the inefficiency problem and squashes the compile-time regression.
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid of one of the FIXMEs it introduced by replacing a predicate comparison with CmpPredicate::getMatching.
In preparation to teach implied-cond functions about samesign, migrate integer-compare predicates that flow through to the functions from CmpInst::Predicate to CmpPredicate.
The LLVM build here: https://lab.llvm.org/buildbot/#/builders/89/builds/14359/steps/5/logs/stdio is failing with error: /usr/bin/ld: tools/flang/tools/bbc/CMakeFiles/bbc.dir/bbc.cpp.o: undefined reference to symbol '_ZN3fir3acc25registerOpenACCExtensionsERN4mlir15DialectRegistryE Add missing dependency.
Added C API functions for the EmitC dialect types.
Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ | ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 | outs() << ProfData.Name << " "; \ | ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ | ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 | std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <[email protected]>
The expression traversal library needs to use interfaces into triplets (and substrings) that return pointers to nested expressions, rather than optional copies of them, since at least one semantic analysis collects a set of references to some subexpression representation class instances, and those references obviously can't point to local copies of objects. Fixes llvm#121999.
…2604) Add tests for negative array extents where necessary, motivated by a compiler crash exposed by yet another fuzzer test, and improve overall error message quality for RESHAPE(). Fixes llvm#122060.
A direct access READ that tries to read past the end of the file must recover the error via an ERR= label, not an END= label (which is not allowed to be present). Fixes llvm#122150.
Always assume that predefined unit 0 is a terminal, so that output to it is never buffered.
A module can't USE itself, either directly within the top-level module or from one of its submodules. Add a test for this case (which we already caught), and improve the diagnostic for the more confusing case involving a submodule.
Commas being optional in FORMAT statements, the tokenization of things like 3I9HHOLLERITH is tricky. After tokenizing the initial '3', we don't want to then take apparent identifier "I9HHOLLERITH" as the next token. So the prescanner just consumes the letter ("I") as its own token in this context. A recent bug report complained that this can lead to incorrect results when (in this case) the letter is a defined preprocessing macro. I updated the prescanner to check that the letter is actually followed by an instance of a problematic Hollerith literal. And this broke two tests in the Fujitsu Fortran test suite that Linaro runs, as it couldn't detect a following Hollerith literal that wasn't on the same source line. We can't do look-ahead line continuation processing in NextToken(), either. So here's a second attempt at fixing the original problem: namely, the letter that follows a decimal integer token is checked to see whether it's the name of a defined macro.
Instead of "Cannot read ...", distinguish true errors in finding and parsing module files from problems with unexpected hash codes by using "Cannot parse" or "Cannot use" wording as appropriate.
llvm#122810) Generic operator/assignment checks for distinguishable specific procedures must ignore inaccessible generic bindings. Fixes llvm#122764.
llvm#122921) We were seeing occasional test failures with expensive checks enabled. The issue was tracked down to a `sort` which should instead be a `stable_sort` to ensure determinism. Checked locally and the non-determinism went away.
…with explicit `this` (llvm#122897) We currently don't emit `DW_AT_object_pointer` on function declarations or definitions. GCC suffers from the same issue: https://godbolt.org/z/h4jeT54G5 Fixing this will help LLDB in identifying static vs. non-static member functions (see llvm#120856). If I interpreted the DWARFv5 spec correctly, it doesn't mandate this attribute be present *only* for implicit object parameters: ``` If the member function entry describes a non-static member function, then that entry has a DW_AT_object_pointer attribute whose value is a reference to the formal parameter entry that corresponds to the object for which the function is called. That parameter also has a DW_AT_artificial attribute whose value is true. ``` This patch attaches the `DW_AT_object_pointer` for function *defintions*. The declarations will be handled in a separate patch. The part about `DW_AT_artificial` seems overly restrictive, and not true for explicit object parameters. We probably should relax this part of the DWARF spec. Partially fixes llvm#120974
…vailable (llvm#114667) There doesn't seem to be much benefit in always providing declarations for the sized deallocations from C++14 onwards if the user explicitly passed `-fno-sized-deallocation` to disable them. This patch simplifies the declarations to be available exactly when the compiler expects sized deallocation functions to be available.
Added `# keep sorted` to a couple of long-ish lists of files that buildifier didn't automatically sort by default. Changed a couple of one-element `toolchains` attributes to the single-line format.
PointerUnion's `is`, `get`, and `dyn_cast` have been deprecated in favour of using `isa`, `cast`, and `dyn_cast` directly. Migrate these uses over.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.