Skip to content

[AutoBump] Merge with 31249e27 (Jan 14) (42) #513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 98 commits into from
Apr 14, 2025

Conversation

jorickert
Copy link

No description provided.

andfau-amd and others added 30 commits January 14, 2025 13:55
…lvm#122329)

This fixes a conflict between the version of LLVM linked against by the
runner and the unrelated version of LLVM that may be dynamically loaded
by a graphics driver. (Relevant to llvm#73457: fixes loading certain Vulkan
drivers.)
…d taskloop simd. (llvm#121746)

Added codegen support for combined masked constructs `Parallel masked
taskloop simd`.
Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`.

Co-authored-by: Chandra Ghale <[email protected]>
This patch adds support for processing the `host_eval` clause of
`omp.target` to populate default and runtime kernel launch attributes.
Specifically, these related to the `num_teams`, `thread_limit` and
`num_threads` clauses attached to operations nested inside of
`omp.target`. As a result, the `thread_limit` clause of `omp.target` is
also supported.

The implementation of `initTargetDefaultAttrs()` is intended to reflect
clang's own processing of multiple constructs and clauses in order to
define a default number of teams and threads to be used as kernel
attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to
translate to LLVM IR target device MLIR modules unless they have a
supported target triple. This is because the local `getGridValue()`
function in the `OpenMPIRBuilder` only works for certain architectures,
and it is called whenever the maximum number of threads has not been
explicitly defined. This limitation also matches clang.

Evaluating the collapsed loop trip count of SPMD and Generic-SPMD
kernels remains unsupported.
… the CONCAT is free (llvm#122750)

Minor followup to llvm#122485 - if the source operands were widened half-size subvectors, then attempt to concatenate the subvectors directly, and then adjust the shuffle mask so references to the second operand now refer to the upper half of the concat result.
No difference in semantics here as byval is already handled
separately. This simplifies migration to the captures attribute.
This was needed before llvm#115077
since the compiler-rt test build made assumptions about the build
layout of libc++ and libc++abi, but now they link against a local
installation of these libraries so we no longer need this workaround.
… runner (llvm#122329)"

This reverts commit f879da7. The change
to not export certain symbols apparently broke the UBsan/Asan buildbot,
because the DSO for the sanitizer wants to link to them.
This patch adds support for lowering OpenMP clauses and expressions
attached to constructs nested inside of a target region that need to be
evaluated in the host device. This is done through the use of the
`OpenMP_HostEvalClause` `omp.target` set of operands and entry block
arguments.

When lowering clauses for a target construct, a more involved
`processHostEvalClauses()` function is called, which looks at the
current and potentially other nested constructs in order to find and
lower clauses that need to be processed outside of the `omp.target`
operation under construction. This populates an instance of a global
structure with the resulting MLIR values.

The resulting list of host-evaluated values is used to initialize the
`host_eval` operands when constructing the `omp.target` operation, and
then replaced with the corresponding block arguments after creating that
operation's region.

Afterwards, while lowering nested operations, those that might
potentially be evaluated on the host (i.e. `num_teams`, `thread_limit`,
`num_threads` and `collapse`) check first whether there is an active
global host-evaluated information structure and whether it holds values
referring to these clauses. If that is the case, the stored values
(`omp.target` entry block arguments at that stage) are used instead of
lowering these clauses again.
The behavior is not entirely consistent with that of clang for the
moment since detailed timing information on the LLVM IR optimization and
code generation passes is not provided. The -ftime-report= option is
also not enabled since that is only relevant for information about the
LLVM IR passes. However, some code to handle that option has been
included, to make it easier to support the option when the issues
blocking it are resolved. A FortranSupport library has been created that
is intended to mirror the LLVM and MLIR support libraries.

Based on @tarunprabhu's PR
llvm#107270 with minor changes
addressing latest review feedback. He's busy and we'd like to get this
support in ASAP.

Co-authored-by: Tarun Prabhu <[email protected]>
Fix Fortran test failures caused by the introduction of the
amdgcn-amd-amdhsa target triple in llvm#116052.
…2895)

Even if their evaluation succeeds, mark them as invalid. This fixes some
long standing differences to the ast walker interpeter.
…vm#122885)

This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.

Fixes llvm#122681
Fixes llvm#122682
)

Mitigate against potential bugs that might place it elsewhere and render
the mechanism useless.
The 2024-12 ISA spec release[1] add these features:
FEAT_SME_MOP4(sme-mop4) to enable SME Quarter-tile outer product
instructions
and
FEAT_SME_TMOP(sme-tmop) to enable SME Structured sparsity outer product
instructions
to allow these instructions to be available outside Armv9.6/sme2p2

[1]
https://developer.arm.com/Architectures/A-Profile%20Architecture#Downloads
To be conservative, explicitly exclude byval arguments, which
doesNotCapture() would otherwise allow. Even if byval has an
initializes attribute, it would only apply to the implicit
copy.
True16 format for v_cmp_lt_f16. Update VOPC t16 and fake16 pseudo.
This is intended for use with Arm's Guarded Control Stack extension
(GCS). Which reuses some existing shadow stack support in Linux. It
should also work with the x86 equivalent.

A "ss" flag is added to the "VmFlags" line of shadow stack memory
regions in `/proc/<pid>/smaps`. To keep the naming generic I've called
it shadow stack instead of guarded control stack.

Also the wording is "shadow stack: yes" because the shadow stack region
is just where it's stored. It's enabled for the whole process or it
isn't. As opposed to memory tagging which can be enabled per region, so
"memory tagging: enabled" fits better for that.

I've added a test case that is also intended to be the start of a set of
tests for GCS. This should help me avoid duplicating the inline assembly
needed.

Note that no special compiler support is needed for the test. However,
for the intial enabling of GCS (assuming the libc isn't doing it) we do
need to use an inline assembly version of prctl.

This is because as soon as you enable GCS, all returns are checked
against the GCS. If the GCS is empty, the program will fault. In other
words, you can never return from the function that enabled GCS, unless
you push values onto it (which is possible but not needed here).

So you cannot use the libc's prctl wrapper for this reason. You can use
that wrapper for anything else, as we do to check if GCS is enabled.
Implementing `constexpr std::stable_sort`. This is part of P2562R1,
tracked via issue llvm#105360.

Closes llvm#119394

Co-authored-by: A. Jiang <[email protected]>
Co-authored-by: Louis Dionne <[email protected]>
…lvm#122433)

Currently, when the result type is 1-`tuple`, `tuple_cat` possibly tests
an undesired constructor of the element, due to conversion from the
reference tuple to the result type. If the element type has an
unconstrained constructor template, there can be extraneous hard error
which shouldn't happen.

This patch introduces a helper function template to select the element-wise
constructor template of `tuple`, which can avoid such error.

Fixes llvm#41034.
Data entry operations which are created from constructs with async
clause that has no value (aka `acc data copyin(var) async`) end up
holding an attribute array named to keep track of this information.
However, in cases where `async` clause is not used, calling
`hasAsyncOnly` ends up crashing since this attribute is not set.

Thus, to fix this issue, ensure that we check for this attribute before
trying to walk the attribute array.
This PR fixes the ambiguities in name lookup caused by non-standard
member typedefs `size_type` and `difference_type` in `std::bitset`.

Follows up llvm#121620.
Closes llvm#121618.
A regular expression was used in the lexing process. It made the program
take more than linear time with regards to the length of the input. It
looked like the entire buffer could be scanned for every token lexed.
Now the regular expression is replaced with code. Previously it took 20
minutes for the program to format 125 000 lines of code on my computer.
Now it takes 315 milliseconds.
…tern (llvm#122721)

* Relocates two tests for `PadOpVectorizationWithTransferWritePattern`
  in "vectorization-pad-patterns.mlir" to group them with other tests
  for the same pattern.
* Adds a note clarifying that these are negative tests and explains the
  reasoning behind them.
* Removes `transform.apply_patterns.linalg.decompose_pad` from the TD
  sequences as it's no longer needed (*).

This is essentially a small clean-up in preparation for upcoming
changes.

(*) `transform.apply_patterns.linalg.decompose_pad` was split off from
`transform.apply_patterns.linalg.pad_vectorization` in llvm#117329.
"vectorization-pad-patterns.mlir" is meant to test the latter, not the
former.
…lvm#122292)

This patch changes the codgegn for non-precise cosh calls to generate
math.cosh ops. This wasn't done before because the math dialect did not
have a cosh operation at the time.
razvanlupusoru and others added 27 commits January 14, 2025 11:26
According to one of the LLVM builds:

llvm#122495 (comment)
The linking to various "mlir::Pass::" methods is failing. Ensure
dependency is properly setup.
…e_explicit_initialization]] (llvm#122947)

This makes it consistent with
`[[clang::require_constant_initialization]]`.

(The attribute was just added to Clang a few minutes ago, so there are
no users yet.)
…ds (llvm#122524)

In LTO builds, some test checks can be optimized away, since the
compiler can
see through the memory accesses after inlining across TUs. This causes
the existing death tests to fail, since the functions are completely
optimized out and things like copying a lambda will no longer occur and
trigger the sanitizer.

To prevent that, we can use an empty inline assembly block to tell the
compiler that memory is modified, and prevent it from doing that.
)

66badf2 (VT: teach a special-case optz about samesign) introduced a
compile-time regression due to the use of CmpPredicate::getMatching,
which is unnecessarily inefficient. Introduce
CmpPredicate::getPreferredSignedPredicate, which alleviates the
inefficiency problem and squashes the compile-time regression.
Follow up on 4a0d53a (PatternMatch: migrate to CmpPredicate) to get rid
of one of the FIXMEs it introduced by replacing a predicate comparison
with CmpPredicate::getMatching.
In preparation to teach implied-cond functions about samesign, migrate
integer-compare predicates that flow through to the functions from
CmpInst::Predicate to CmpPredicate.
The LLVM build here:

https://lab.llvm.org/buildbot/#/builders/89/builds/14359/steps/5/logs/stdio
is failing with error:
/usr/bin/ld: tools/flang/tools/bbc/CMakeFiles/bbc.dir/bbc.cpp.o:
undefined reference to symbol
'_ZN3fir3acc25registerOpenACCExtensionsERN4mlir15DialectRegistryE

Add missing dependency.
Added C API functions for the EmitC dialect types.
Exposed by -Warray-bounds:

In file included from
../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252:

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
error: array index 4 is past the end of the array (that has type 'const
std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned
short[4]')) [-Werror,-Warray-bounds]
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
| ^ ~~~~~~~~~~~

../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15:
note: expanded from macro 'INSTR_PROF_DATA'
250 | outs() << ProfData.Name << " "; \
      |               ^        ~~~~

../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1:
note: array 'NumValueSites' declared here
109 | INSTR_PROF_DATA(const uint16_t, Int16ArrayTy,
NumValueSites[IPVK_Last+1], \
      | ^

../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3:
note: expanded from macro 'INSTR_PROF_DATA'
   62 |   std::remove_const<Type>::type Name;

Avoid accessing out-of-bound data, but skip printing array data for now.
As there is no simple way to do this without hardcoding the
NumValueSites field.

---------

Co-authored-by: Ethan Luis McDonough <[email protected]>
The expression traversal library needs to use interfaces into triplets
(and substrings) that return pointers to nested expressions, rather than
optional copies of them, since at least one semantic analysis collects a
set of references to some subexpression representation class instances,
and those references obviously can't point to local copies of objects.

Fixes llvm#121999.
…2604)

Add tests for negative array extents where necessary, motivated by a
compiler crash exposed by yet another fuzzer test, and improve overall
error message quality for RESHAPE().

Fixes llvm#122060.
A direct access READ that tries to read past the end of the file must
recover the error via an ERR= label, not an END= label (which is not
allowed to be present).

Fixes llvm#122150.
Always assume that predefined unit 0 is a terminal, so that output to it
is never buffered.
A module can't USE itself, either directly within the top-level module
or from one of its submodules. Add a test for this case (which we
already caught), and improve the diagnostic for the more confusing case
involving a submodule.
Commas being optional in FORMAT statements, the tokenization of things
like 3I9HHOLLERITH is tricky. After tokenizing the initial '3', we don't
want to then take apparent identifier "I9HHOLLERITH" as the next token.
So the prescanner just consumes the letter ("I") as its own token in
this context.

A recent bug report complained that this can lead to incorrect results
when (in this case) the letter is a defined preprocessing macro. I
updated the prescanner to check that the letter is actually followed by
an instance of a problematic Hollerith literal.

And this broke two tests in the Fujitsu Fortran test suite that Linaro
runs, as it couldn't detect a following Hollerith literal that wasn't on
the same source line. We can't do look-ahead line continuation
processing in NextToken(), either.

So here's a second attempt at fixing the original problem: namely, the
letter that follows a decimal integer token is checked to see whether
it's the name of a defined macro.
Instead of "Cannot read ...", distinguish true errors in finding and
parsing module files from problems with unexpected hash codes by using
"Cannot parse" or "Cannot use" wording as appropriate.
llvm#122810)

Generic operator/assignment checks for distinguishable specific
procedures must ignore inaccessible generic bindings.

Fixes llvm#122764.
llvm#122921)

We were seeing occasional test failures with expensive checks enabled.
The issue was tracked down to a `sort` which should instead be a
`stable_sort` to ensure determinism. Checked locally and the
non-determinism went away.
…with explicit `this` (llvm#122897)

We currently don't emit `DW_AT_object_pointer` on function declarations
or definitions. GCC suffers from the same issue:
https://godbolt.org/z/h4jeT54G5

Fixing this will help LLDB in identifying static vs. non-static member
functions (see llvm#120856).

If I interpreted the DWARFv5 spec correctly, it doesn't mandate this
attribute be present *only* for implicit object parameters:
```
If the member function entry describes a non-static member function,
then that entry has a DW_AT_object_pointer attribute whose value is a reference to
the formal parameter entry that corresponds to the object for which the
function is called.

That parameter also has a DW_AT_artificial attribute whose value is true.
```

This patch attaches the `DW_AT_object_pointer` for function
*defintions*. The declarations will be handled in a separate patch.

The part about `DW_AT_artificial` seems overly restrictive, and not true
for explicit object parameters. We probably should relax this part of
the DWARF spec.

Partially fixes llvm#120974
…vailable (llvm#114667)

There doesn't seem to be much benefit in always providing declarations
for the sized deallocations from C++14 onwards if the user explicitly
passed `-fno-sized-deallocation` to disable them. This patch simplifies
the declarations to be available exactly when the compiler expects sized
deallocation functions to be available.
Added `# keep sorted` to a couple of long-ish lists of files that
buildifier didn't automatically sort by default.

Changed a couple of one-element `toolchains` attributes to the
single-line format.
PointerUnion's `is`, `get`, and `dyn_cast` have been deprecated in
favour of using `isa`, `cast`, and `dyn_cast` directly. Migrate these
uses over.
Base automatically changed from bump_to_be96bd74 to bump_to_392622d0 April 14, 2025 07:57
@jorickert jorickert merged commit 5073796 into bump_to_392622d0 Apr 14, 2025
12 checks passed
@jorickert jorickert deleted the bump_to_31249e27 branch April 14, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.