Skip to content

[SYCL-MLIR] Merge from intel/llvm sycl branch #8849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3,700 commits into from
Mar 29, 2023

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Mar 29, 2023

Please only review 953eba7 and the files with conflict:

both modified:   llvm/lib/TargetParser/Triple.cpp
both modified:   mlir/lib/Dialect/Arith/IR/ArithOps.cpp
both modified:   mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp <= SYCL-MLIR change moved to mlir/lib/Dialect/LLVMIR/IR/LLVMInlining.cpp
both modified:   mlir/test/Dialect/LLVMIR/invalid.mlir

Temporally reverted 946f803, modified #8616 to investigate and add it back.

Please do not squash and merge this PR.

bulbazord and others added 30 commits March 24, 2023 12:42
As of c5bfa3d, REPL.h no longer has a
private implementation header in it. This TODO and the thing it marks
cdan be removed.
Added a test to https://reviews.llvm.org/D146657, make sure the guid and name are computed using the debug info name.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D146826
Summary:
A recent patch allowed us to emit a callable kernel from freestanding
NVPTX code. This allows us to move away from using the CUDA language.
This has several advantages in that it works around an entire assortment
of errors I was seeing while implementing RPC for Nvidia.
Folding of the tosa.transpose operation is both time and memory
intensive as the underlying ElementsAttr is processed as a sequence of
Attributes. This change attempts operate on the underlying raw data of
the ElementsAttr.

In an example resnet50 network, this change reduces the time spent in
folding transpose ops from 35s to 1.5s.

Reviewed By: GeorgeARM, rsuderman, stellaraccident

Differential Revision: https://reviews.llvm.org/D146526
This patch fixes:

  llvm/unittests/Support/ScopedPrinterTest.cpp:519:20: error: unused
  variable 'InfDouble' [-Werror,-Wunused-variable]

  llvm/unittests/Support/ScopedPrinterTest.cpp:520:16: error: unused
  variable 'NaNDouble' [-Werror,-Wunused-variable]

  llvm/unittests/Support/ScopedPrinterTest.cpp:516:15: error: unused
  variable 'NaNFloat' [-Werror,-Wunused-variable]

  llvm/unittests/Support/ScopedPrinterTest.cpp:515:19: error: unused
  variable 'InfFloat' [-Werror,-Wunused-variable]

Since commit fa56e36 has temporarily
disabled tests involving these constants, this patch simply comments
them out instead of removing them.
X. Sun et al. (https://dl.acm.org/doi/10.5555/3454287.3454728) published
a paper showing that an FP format with 4 bits of exponent, 3 bits of
significand and an exponent bias of 11 would work quite well for ML
applications.

Google hardware supports a variant of this format where 0x80 is used to
represent NaN, as in the Float8E4M3FNUZ format. Just like the
Float8E4M3FNUZ format, this format does not support -0 and values which
would map to it will become +0.

This format is proposed for inclusion in OpenXLA's StableHLO dialect: openxla/stablehlo#1308

As part of inclusion in that dialect, APFloat needs to know how to
handle this format.

Differential Revision: https://reviews.llvm.org/D146441
…commit.

The option -fno-omit-frame-pointer was accidentally added to the x86_64
longjmp target. This change not only removes it, but makes it
-fomit-frame-pointer.
When printing a value, allow the root value's name to be elided, without omiting the
names of child values.

At the API level, this adds `SetHideRootName()`, which joins the existing
`SetHideName()` function.

This functionality is used by `dwim-print` and `expression`.

Fixes an issue identified by @jgorbe in https://reviews.llvm.org/D145609.

Differential Revision: https://reviews.llvm.org/D146783
… const

This helps the `Merger` maintain invariants, as well as clarifying the immutability of the underlying objects (with the one exception of `TensorExp::val`).

Depends On: D146559

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D146083
* Moving the `Children` class to be nested under `TensorExp`.
* Marking `TensorExp`, `TensorExp::Children`, and `LatPoint` as final.

Depends On D146083

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D146562
In the next few commits I will be converting the various Merger identifier typedefs into newtypes; and once that's done, the `kInvalidId` constant will only be used internally and therefore does not need to be part of the public `mlir::sparse_tensor` namespace.

Depends On D146673

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D146674
For singleton transfer reads, we allow hoisting them out
of the enclosing loop if its users are either transfer reads
or memory effect free.

Differential Revision: https://reviews.llvm.org/D146828
When the input types don't match the accumulator type in named
convolution ops there is supposed to be a conversion to the accumulator
type before the multiply and accumulate.

Differential Revision: https://reviews.llvm.org/D146824
There were added when I removed the swig interface files in
662548c. However, they mostly meant for
me to better track the differences between the existing API headers and
bindings interfaces. There's nothing actionable about these so I remove
them.
Depends On D146674

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D146676
This flag causes the toolchain distribution to be built using LLVM
CMake's multiple distribution feature. The distribution* family of CMake
targets would be replaced with the toolchain-distribution* family.

This shouldn't otherwise affect the semantics of the build, but it sets
up the ability to split out the LLDB build from the main distribution
used by Fuchsia.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D146665
…egex.

The original -show-graph option dumped the LinkGraph for all graphs loaded into
the session, but can make it difficult to see small graphs (e.g. reduced test
cases) among the surrounding larger files (especially the ORC runtime).

The new -show-graphs option takes a regex and dumps only those graphs matching
the regex. This allows testcases to specify exactly which graphs to dump.
…GatherVV_index to separate the reads for VRGatherVV

Differential Revision: https://reviews.llvm.org/D145406
The old reproducer functionality has been removed. Remove this call as
it's now just a NO-OP.
haroon26 and others added 5 commits March 28, 2023 19:28
)

Closes intel#8201
provided default value for `predicate` argument in
`sub_group_mask::group_ballot` definition

---------

Co-authored-by: Alexey Sachkov <[email protected]>
…L and Level Zero backends (intel#8825)

This patch updates the `atomic_memory_order*` E2E tests to include the
`level_zero` and `opencl` backends, as these should now return the
required memory order capability sets tested in and required by these
tests.
…el#8586)

Adds support to query devices for `atomic_fence_order_capabilities` and
`atomic_fence_scope_capabilities`. The backends supported are OpenCL and
Level Zero. For the rest of backends, it has been left unsupported.

Fixes intel#8293.

---------

Signed-off-by: Maronas, Marcos <[email protected]>
Note: the unqualified name lookup of joint_reduce in the overload of
joint_reduce without an init param was not finding the overload of
joint_reduce with an init param (because that declaration was located
after it), so it searched for joint_reduce via ADL. With sycl::group,
ADL can find both overloads of joint_reduce, but with sycl::sub_group =
sycl::ext::oneapi::sub_group, ADL finds no joint_reduce in
sycl::ext::oneapi.

Fixes intel#8348

---------

Signed-off-by: Cai, Justin <[email protected]>
@whitneywhtsang whitneywhtsang added disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch labels Mar 29, 2023
@whitneywhtsang whitneywhtsang self-assigned this Mar 29, 2023
@whitneywhtsang whitneywhtsang requested a review from etiotto as a code owner March 29, 2023 02:53
@sommerlukas
Copy link
Contributor

Only reviewed c4bb8c4 so far, as accessing the changes to the other four files is a bit too much for my browser. Will need to find a different way to look at those changes.

For c4bb8c4, I had one question: Why do we need to add two members to GPUFuncOp, is that our change to make them work with transformations?

@victor-eds
Copy link
Contributor

Changes LGTM apart from comments. I've had to make some changes to merge this upstream, so I'll apply the changes later today so that we don't have it as a conflict next time.

@whitneywhtsang
Copy link
Contributor Author

whitneywhtsang commented Mar 29, 2023

For c4bb8c4, I had one question: Why do we need to add two members to GPUFuncOp, is that our change to make them work with transformations?

We added CallableOpInterface to GPUFuncOp in 9042844. Upstream commit f809eb4 extends the CallableOpInterface to provide access to the argument and result attributes if available. That's why I added the two members to GPUFuncOp, or else there is build failures.

@whitneywhtsang
Copy link
Contributor Author

Addressed review comment, the commit to fix merge is now 90f7535.

@whitneywhtsang
Copy link
Contributor Author

Updated the fix merge commit to update the test suite xfail list.

@whitneywhtsang
Copy link
Contributor Author

whitneywhtsang commented Mar 29, 2023

@whitneywhtsang whitneywhtsang merged commit 953eba7 into intel:sycl-mlir Mar 29, 2023
@whitneywhtsang whitneywhtsang deleted the merge branch March 30, 2023 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs sycl-mlir Pull requests or issues for sycl-mlir branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.