-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alpha support for SVE2.1 #257
Conversation
In this patch it is used for the prototype: * svptrue_c8 (and _c16/_c32/_c64) As described in: ARM-software/acle#257 Patch by: Sander de Smalen <[email protected]> Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D150953
commit 1496c57722c7db8db7e582b582317e15e719ceb0 Merge: f28ae00bf6a3 074276b9ae76 Author: ns_tester <[email protected]> Date: Wed Jun 7 22:32:59 2023 -0700 LLVM and SPIRV-LLVM-Translator pulldown (WW22) LLVM: llvm/llvm-project@40c26ecSPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@c2ff406 commit f28ae00bf6a3dc946194e6f8b543a115fe241c20 Author: Nick Sarnie <[email protected]> Date: Wed Jun 7 23:47:13 2023 -0400 [ESIMD] More support for 64-bit offsets with accessors in stateless mode (#9591) This adds support for 64-bit offsets with accessors in stateless mode for the remaining APIs. Please let me know if I missed any. Today, all of the APIs convert to 32-bit offsets with no error if passed a 64-bit offset, except for vector offset versions of `lsc_gather`, `lsc_scatter`, and `lsc_prefetch`. Do not error except in these three cases in order to preserve backward compatibility. I manually ran all of these tests on PVC and confirmed they pass with this change and fail without it. In some cases, in stateful mode, the underlying intrinsic we call only supports 32-bit offsets, so we need to convert. --------- Signed-off-by: Sarnie, Nick <[email protected]> commit f34e5458aa63bb2a4362c327859f49474f873b9d Author: Nick Sarnie <[email protected]> Date: Wed Jun 7 20:42:52 2023 -0400 [SYCL][ESIMD] Use SPIR-V intrinsic to cast image object to int (#9696) We currently have a hack that relies on the type the Clang frontend generates for images, see [here](https://github.com/intel/llvm/blob/12dd0ad040ea61f1201fa9d82efd5079ce7dc6ca/sycl/include/sycl/ext/intel/esimd/detail/memory_intrin.hpp#L1171). With opaque pointers, the Clang frontend generates image types as target extension types instead of pointers, so the hack fails. The cleanest way to fix this would be to do the cast at reverse-translation time inside IGC, however the IGC team refused that solution. Instead, punt the cast to inside the SPIR-V translator when converting to SPIR-V, where the type will be a pointer as well. The `__spirv_ConvertPtrToU` function will be converted to `OpConvertPtrToU` inside the SPIR-V translator. This is definitely still a hack, but I don't think it's more hacky than before, and I don't know of any other ways to fix this. Note this solution works for both typed pointers and opaque pointers, and for normal pointer accessors and image accessors. Signed-off-by: Sarnie, Nick <[email protected]> commit 8990c5503d47e397c837d991bf6bc5a0feda9b8a Author: Igor Gorban <[email protected]> Date: Wed Jun 7 22:19:33 2023 +0200 [SYCL] Fix handling unsupported attributes (#9756) llvm::Attribute::ReadNone/ReadOnly/WriteOnly are no longer supports, to have posibility to fix them with calls, generated by external library (vc-intrinsics) - it is needed to remove them manually It is impossible to fix this on vc-intrinsics side, because it works not only with latest llvm-version and use this attributes in another projects. --------- Signed-off-by: Vyacheslav N Klochkov <[email protected]> Co-authored-by: Vyacheslav N Klochkov <[email protected]> commit 485221047281e3d47f7376394667b85a63173991 Author: aelovikov-intel <[email protected]> Date: Wed Jun 7 08:48:04 2023 -0700 [CI] Generate test matrix on self-hosted runner (#9773) Github's ubuntu-* runners could take multiple hours to allocate in our organization. Switch to our self-hosted cuda runner that is sitting idle because we perform CUDA testing in AWS. commit 64bd50820262ded6fbd32d63bd96d5fdbf6861ac Author: aelovikov-intel <[email protected]> Date: Wed Jun 7 07:45:29 2023 -0700 [SYCL][CI] Fuse two post-commit builds into one (#9695) commit 074276b9ae760528d97f75d767a1744e6f2a3f2f Merge: 0ca2be5c82ec d48a5fb2b664 Author: Artur Gainullin <[email protected]> Date: Wed Jun 7 07:11:20 2023 -0700 Merge remote-tracking branch 'origin/sycl' into llvmspirv_pulldown commit d48a5fb2b6645819a6811a65a402d322c222dc36 Author: fineg74 <[email protected]> Date: Wed Jun 7 04:59:29 2023 -0700 [SYCL][ESIMD] Update the test regression/atomic_update_test.cpp to improve reliability (#9715) commit eb7e3f032ff98fa98b4927fe9a785e75a5c51240 Author: jinz2014 <[email protected]> Date: Wed Jun 7 06:58:20 2023 -0400 [SYCL] Add unit tests for the HIP plugin (#9391) The kernel test (test_kernels.cpp) is incomplete because how to generate binary files properly for "piProgramCreateWithBinary" for the HIP backend is not clear to me. Thank you for reviewing and editing the PR. --------- Co-authored-by: Jin Z <[email protected]> Co-authored-by: Dmitry Vodopyanov <[email protected]> Co-authored-by: Jin Z <[email protected]> commit 3c19581f828c54ff1037a420b4614c01628bcc56 Author: jinge90 <[email protected]> Date: Wed Jun 7 16:35:10 2023 +0800 [SYCL][libdevice] Move fabs, fabs to imf_fp32/64_dl.cpp and add llabs (#9732) fabsf, fabs and llabs are required by deep learning frameworks, so we move fabsf and fabs to separate file imf_fp32/64_dl.cpp and add llabs to imf_fp32_dl.cpp as well. Signed-off-by: jinge90 <[email protected]> commit f96b85d002745aea35114c512aae020a0e5caaca Author: Chris Perkins <[email protected]> Date: Tue Jun 6 14:04:26 2023 -0700 disable ze_debug tests on Windows for known failures. (#9764) some of the ze_debug=4 memory leak tests are failing on Windows. These are not new failures, as the ze_debug=4 memory checker was disabled on Windows for a long time. It has recently been re-enabled, and now these tests are failing. The shutdown() procedure on Windows is not (yet) parallel to Linux, work is ongoing on that front. This PR disables these tests until we reach shutdown() parity. FWIW, the Windows OS is super aggressive about reclaiming memory, and the BKM in complex situations like this is to just let Windows reclaim. Signed-off-by: Chris Perkins <[email protected]> commit 19b6247ed9be9e2baae2e5a0a1ddddf4f412b1e7 Author: aelovikov-intel <[email protected]> Date: Tue Jun 6 12:56:11 2023 -0700 [SYCL][Test E2E] Fix SG sizes detection in lit.cfg.py (#9761) commit 0ca2be5c82ec6b5be0f5ef6850b3afbfbc99aba3 Author: Churina, Ksenia <[email protected]> Date: Tue Jun 6 12:45:28 2023 -0700 Disable Basic/stream/stream.cpp test for HIP until it is fixed commit 93a487cc72a7e0c4852a41678d102a08e20192b0 Author: jinz2014 <[email protected]> Date: Tue Jun 6 15:42:29 2023 -0400 [SYCL][HIP] Add the interop-buffer-hip test (#9705) Co-authored-by: Jin Z <[email protected]> commit 4826c07e02c1df6cf4ac4f21b650efec37d583c4 Author: Pablo Reble <[email protected]> Date: Tue Jun 6 14:41:05 2023 -0500 [SYCL] ABI check script improve path concatenation (#9482) Patch fixes path concatenation issue. Script fails if the provided path has no trailing slash. Should work OS independently. Manually tested on Linux. commit 3350c05baf495da71222590becc6ec7e9dae50f8 Author: Srividya Sundaram <[email protected]> Date: Tue Jun 6 12:31:13 2023 -0700 [SYCL] Add ESIMD test to check kernel arg size (#9076) commit ca55b912d4f08e04abdb654b9f5ed7f18dd87fd8 Author: fineg74 <[email protected]> Date: Tue Jun 6 12:21:26 2023 -0700 [ESIMD] Make the test regression/bfloat16Constructor.cpp executable on GEN12 (#9748) commit 4a76d213c24cac4615a8f9e57fa3dc643c931956 Author: Fedor Veselovskiy <[email protected]> Date: Tue Jun 6 21:19:41 2023 +0200 [SYCL][InvokeSimd][E2E] Remove XFAIL status from InvokeSimd named barrier tests (#9741) commit db6bec7b7e31ac18c92e71776fda833707678515 Author: Fraser Cormack <[email protected]> Date: Tue Jun 6 20:18:11 2023 +0100 [SYCL][Fusion] Add missing header (#9691) This was causing build failures with some compilers. commit c899a93410c23b26a600159762e4dab5f240bc1f Author: Przemyslaw-Wisniewski-Mobica <[email protected]> Date: Tue Jun 6 21:17:01 2023 +0200 [SYCL] Add sycl/detail/defines_elementary.hpp to bit_cast.hpp to be self contained (#9684) commit f19cfe6a97699a11c78ae248ffe548a8889992bc Author: Andrey Alekseenko <[email protected]> Date: Tue Jun 6 21:13:13 2023 +0200 [SYCL][CUDA] Fix info::device::version (#9623) Report major.minor instead of major.major commit f73230d8a8ba75b0b43b27ce09253e1b51e1757f Author: aelovikov-intel <[email protected]> Date: Tue Jun 6 12:01:40 2023 -0700 [SYCL][ABI-break] Remove getOSModuleHandle usage (#9659) test-e2e/SharedLib,SPVDumpUse show that we don't really need it. commit f44d0133d4b0077298f034697a1f3818ff1d6134 Author: Dirk MG Seynhaeve <[email protected]> Date: Tue Jun 6 11:00:34 2023 -0700 [NFC] Productize clang-offload-extract: clean up code for command line parsing and help (#9594) * Impose the mandatory LLVM style for clang-format * Remove any code that was trying to enhance the LLVM builtin help functionality: the extra code only made for confusing help and error messages. * Don't provide any required options, but provide reasonable defaults. * Clean up the descriptions for the help. Use easier-to-maintain heredocs for the multiline descriptions. * Use the more trivial `--stem` rather than `--output`. The `--output` option is still supported, but labeled deprecated. * Enforce double-dash long options. * Provide more context in error diagnostics. * Streamline the searches and predicates. * Modernize LLVM (e.g. remove predicated makeArrayRef). * More efficient iterators for the range-based for loops. * Extensive comments. commit 8364176393ad741b5dbf56ae58e2c0da1a908bad Author: aelovikov-intel <[email protected]> Date: Tue Jun 6 10:20:44 2023 -0700 [SYCL] Add tests for SYCL_DUMP_IMAGES/SYCL_USER_KERNEL_SPV (#9725) That required to introduce an extra environment variable control - `SYCL_DUMP_IMAGES_PREFIX` to control location of the produced images. commit 0267c1b237409fb5ffc28c0511a30153c29fe29f Author: JackAKirk <[email protected]> Date: Tue Jun 6 14:51:55 2023 +0100 [SYCL][CUDA] Enable sycl-ls-gpu-default-any on CUDA (#9372) This is a migration of this PR https://github.com/intel/llvm-test-suite/pull/1144/commits --------- Signed-off-by: JackAKirk <[email protected]> commit d46d3d68700288203a5c709a7469b6883104f335 Author: JackAKirk <[email protected]> Date: Tue Jun 6 14:50:30 2023 +0100 [SYCL][CUDA][DOC] Added Tensor Cores supported param combinations table to joint_matrix extension doc (#9019) This PR documents the supported joint_matrix API parameters sets when using `ext_oneapi_cuda`, similar to the XMX, AMX tables added here: https://github.com/intel/llvm/pull/7964 This will allow us to point people who would like to use `joint_matrix` on a specific architecture to the extension document. E.g. https://github.com/intel/llvm/issues/8795 --------- Signed-off-by: JackAKirk <[email protected]> commit c0ab9f8bf0d5f6722c03cfd0aba7aca0ae9a2e81 Author: Jakub Chlanda <[email protected]> Date: Tue Jun 6 15:48:37 2023 +0200 [SYCL] Add native half type flag for NVPTX >= SM_53 (#8906) LLVM will now error out if builtins operating on half types are used without explicitly passing `-fnative-half-type` (see: https://reviews.llvm.org/D146715). PTX supports half types since [SM_53](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=half%20precision#half-precision-floating-point-instructions). commit 1a283acaac3cb944746da21e6337ce6cdaea9711 Author: Christoph Bauinger <[email protected]> Date: Tue Jun 6 15:41:27 2023 +0200 [SYCL] Add proposal for append_and_shift extension (#8902) Proposal extends the existing shift_group_left and shift_group_right functions to first append and then shift such that all items in a sub group can have well-defined values after the shift. --------- Co-authored-by: Greg Lueck <[email protected]> commit 10d2f5b613f3f7e383a8140d38cabc35551b68ea Author: mmoadeli <[email protected]> Date: Tue Jun 6 14:40:15 2023 +0100 Adds explicit conversion of multi_ptr<T> to multi_ptr<const T>. (#9750) This ctor has been previously removed, as it had conflict with existing ones. Not having the ctor produces failures to compile some cts tests. An investigation is required. commit fa501fd21a286c5d6d760249a88c6ffddaffe2e8 Author: Georgi Mirazchiyski <[email protected]> Date: Tue Jun 6 12:52:19 2023 +0100 [SYCL][CUDA] Add fix for local size calculation regression (#9736) This PR fixes a performance regression wrt work-group size selection when only `sycl::range` is used. The regression was reported in issue [#5627](https://github.com/intel/llvm/issues/5627). We want the work-groups to be uniformly distributed but that could lead to non-optmially sized work-groups is the global work size is not an even number. Ideally, we want ensure that the work-group size is a power of two. commit 37bb6a2bab16f58d7fe8f7418688d36db9e4422a Author: Petr Vesely <[email protected]> Date: Tue Jun 6 12:16:21 2023 +0100 [SYCL][PI][UR] Fix pi2ur sampler return info (#9693) pi2ur was missing a conversion from UnifiedRuntime sampler info values to valid PI sampler Info values. This PR implements a valid conversion between these values. commit 2ab86f1149b7965bf352d2604bf9c95d98c0b350 Author: aelovikov-intel <[email protected]> Date: Tue Jun 6 04:15:13 2023 -0700 [CI] Include check-libdevice to BUILD LIT checks (#9743) commit 835ced6c88de821f9c3d97138153828845d3e631 Author: aelovikov-intel <[email protected]> Date: Mon Jun 5 21:36:53 2023 -0700 [CI] Align installation steps between Linux/Windows (#9746) * Use LLVM_INSTALL_UTILS=ON on Windows * Move clang-{format,tidy} installation into its own step * Reorder lines to match between Linux/Windows commit 09f76e8afd2ffcd988cc490aebc775a304cd23a6 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Jun 5 19:35:18 2023 -0700 Bump requests from 2.28.1 to 2.31.0 in /llvm/utils/git (#9560) Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.31.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/psf/requests/releases">requests's releases</a>.</em></p> <blockquote> <h2>v2.31.0</h2> <h2>2.31.0 (2023-05-22)</h2> <p><strong>Security</strong></p> <ul> <li> <p>Versions of Requests between v2.3.0 and v2.30.0 are vulnerable to potential forwarding of <code>Proxy-Authorization</code> headers to destination servers when following HTTPS redirects.</p> <p>When proxies are defined with user info (<a href="https://user:pass@proxy:8080">https://user:pass@proxy:8080</a>), Requests will construct a <code>Proxy-Authorization</code> header that is attached to the request to authenticate with the proxy.</p> <p>In cases where Requests receives a redirect response, it previously reattached the <code>Proxy-Authorization</code> header incorrectly, resulting in the value being sent through the tunneled connection to the destination server. Users who rely on defining their proxy credentials in the URL are <em>strongly</em> encouraged to upgrade to Requests 2.31.0+ to prevent unintentional leakage and rotate their proxy credentials once the change has been fully deployed.</p> <p>Users who do not use a proxy or do not supply their proxy credentials through the user information portion of their proxy URL are not subject to this vulnerability.</p> <p>Full details can be read in our <a href="https://github.com/psf/requests/security/advisories/GHSA-j8r2-6x86-q33q">Github Security Advisory</a> and <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-32681">CVE-2023-32681</a>.</p> </li> </ul> <h2>v2.30.0</h2> <h2>2.30.0 (2023-05-03)</h2> <p><strong>Dependencies</strong></p> <ul> <li> <p>⚠️ Added support for urllib3 2.0. ⚠️</p> <p>This may contain minor breaking changes so we advise careful testing and reviewing <a href="https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html">https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html</a> prior to upgrading.</p> <p>Users who wish to stay on urllib3 1.x can pin to <code>urllib3<2</code>.</p> </li> </ul> <h2>v2.29.0</h2> <h2>2.29.0 (2023-04-26)</h2> <p><strong>Improvements</strong></p> <ul> <li>Requests now defers chunked requests to the urllib3 implementation to improve standardization. (<a href="https://redirect.github.com/psf/requests/issues/6226">#6226</a>)</li> <li>Requests relaxes header component requirements to support bytes/str subclasses. (<a href="https://redirect.github.com/psf/requests/issues/6356">#6356</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/psf/requests/blob/main/HISTORY.md">requests's changelog</a>.</em></p> <blockquote> <h2>2.31.0 (2023-05-22)</h2> <p><strong>Security</strong></p> <ul> <li> <p>Versions of Requests between v2.3.0 and v2.30.0 are vulnerable to potential forwarding of <code>Proxy-Authorization</code> headers to destination servers when following HTTPS redirects.</p> <p>When proxies are defined with user info (<a href="https://user:pass@proxy:8080">https://user:pass@proxy:8080</a>), Requests will construct a <code>Proxy-Authorization</code> header that is attached to the request to authenticate with the proxy.</p> <p>In cases where Requests receives a redirect response, it previously reattached the <code>Proxy-Authorization</code> header incorrectly, resulting in the value being sent through the tunneled connection to the destination server. Users who rely on defining their proxy credentials in the URL are <em>strongly</em> encouraged to upgrade to Requests 2.31.0+ to prevent unintentional leakage and rotate their proxy credentials once the change has been fully deployed.</p> <p>Users who do not use a proxy or do not supply their proxy credentials through the user information portion of their proxy URL are not subject to this vulnerability.</p> <p>Full details can be read in our <a href="https://github.com/psf/requests/security/advisories/GHSA-j8r2-6x86-q33q">Github Security Advisory</a> and <a href="https://nvd.nist.gov/vuln/detail/CVE-2023-32681">CVE-2023-32681</a>.</p> </li> </ul> <h2>2.30.0 (2023-05-03)</h2> <p><strong>Dependencies</strong></p> <ul> <li> <p>⚠️ Added support for urllib3 2.0. ⚠️</p> <p>This may contain minor breaking changes so we advise careful testing and reviewing <a href="https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html">https://urllib3.readthedocs.io/en/latest/v2-migration-guide.html</a> prior to upgrading.</p> <p>Users who wish to stay on urllib3 1.x can pin to <code>urllib3<2</code>.</p> </li> </ul> <h2>2.29.0 (2023-04-26)</h2> <p><strong>Improvements</strong></p> <ul> <li>Requests now defers chunked requests to the urllib3 implementation to improve standardization. (<a href="https://redirect.github.com/psf/requests/issues/6226">#6226</a>)</li> <li>Requests relaxes header component requirements to support bytes/str subclasses. (<a href="https://redirect.github.com/psf/requests/issues/6356">#6356</a>)</li> </ul> <h2>2.28.2 (2023-01-12)</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/psf/requests/commit/147c8511ddbfa5e8f71bbf5c18ede0c4ceb3bba4"><code>147c851</code></a> v2.31.0</li> <li><a href="https://github.com/psf/requests/commit/74ea7cf7a6a27a4eeb2ae24e162bcc942a6706d5"><code>74ea7cf</code></a> Merge pull request from GHSA-j8r2-6x86-q33q</li> <li><a href="https://github.com/psf/requests/commit/302225334678490ec66b3614a9dddb8a02c5f4fe"><code>3022253</code></a> test on pypy 3.8 and pypy 3.9 on windows and macos (<a href="https://redirect.github.com/psf/requests/issues/6424">#6424</a>)</li> <li><a href="https://github.com/psf/requests/commit/b639e66c816514e40604d46f0088fbceec1a5149"><code>b639e66</code></a> test on py3.12 (<a href="https://redirect.github.com/psf/requests/issues/6448">#6448</a>)</li> <li><a href="https://github.com/psf/requests/commit/d3d504436ef0c2ac7ec8af13738b04dcc8c694be"><code>d3d5044</code></a> Fixed a small typo (<a href="https://redirect.github.com/psf/requests/issues/6452">#6452</a>)</li> <li><a href="https://github.com/psf/requests/commit/2ad18e0e10e7d7ecd5384c378f25ec8821a10a29"><code>2ad18e0</code></a> v2.30.0</li> <li><a href="https://github.com/psf/requests/commit/f2629e9e3c7ce3c3c8c025bcd8db551101cbc773"><code>f2629e9</code></a> Remove strict parameter (<a href="https://redirect.github.com/psf/requests/issues/6434">#6434</a>)</li> <li><a href="https://github.com/psf/requests/commit/87d63de8739263bbe17034fba2285c79780da7e8"><code>87d63de</code></a> v2.29.0</li> <li><a href="https://github.com/psf/requests/commit/51716c4ef390136b0d4b800ec7665dd5503e64fc"><code>51716c4</code></a> enable the warnings plugin (<a href="https://redirect.github.com/psf/requests/issues/6416">#6416</a>)</li> <li><a href="https://github.com/psf/requests/commit/a7da1ab3498b10ec3a3582244c94b2845f8a8e71"><code>a7da1ab</code></a> try on ubuntu 22.04 (<a href="https://redirect.github.com/psf/requests/issues/6418">#6418</a>)</li> <li>Additional commits viewable in <a href="https://github.com/psf/requests/compare/v2.28.1...v2.31.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=requests&package-manager=pip&previous-version=2.28.1&new-version=2.31.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 9a82d283ae2bbc092e796571ec9defbc7eb9c4a6 Author: Michael Toguchi <[email protected]> Date: Mon Jun 5 18:34:02 2023 -0700 [Driver][SYCL] Fix optimization option processing for device options (#9703) When using -O0, we imply -cl-opt-disable for device. This was incorrectly being implied when we were overriding with an optimization enabling option (-O0 -O2). Fix the logic. commit 0ae900a9a6784f45833784b9f4262d622733a789 Author: Artur Gainullin <[email protected]> Date: Mon Jun 5 17:16:36 2023 -0700 [SYCL] Fix kernel-bundle-merge-options-env.cpp test Test is supposed to check that options provided through the driver via -Xsycl-target-linker and -Xsycl-target-frontend get overriden by options provided through env variables SYCL_PROGRAM_COMPILE_OPTIONS and SYCL_PROGRAM_LINK_OPTIONS. Test author used dummy options called "-bar" and "-bar_compile" to check that they are overriden. But those are actually considered not as a dummy option but as a real option "-b" which was not the original intent. After the commit in llorg which removes "-b" option from the driver: commit 89d71c1efa85656b54bcd79b4278bc67690480e1 Author: Fangrui Song <[email protected]> Date: Fri May 26 15:30:23 2023 -0700 [Driver] Reject AIX-specific link options on non-AIX targets test started to fail. So, replace those options with "-DBAR" and "-DBAR_COMPILE" respectively. commit 1a3e99307e4f754d300a14f4dec8111322644d85 Author: Steffen Larsen <[email protected]> Date: Mon Jun 5 16:51:57 2023 +0100 [SYCL] Add missing SYCL 2020 image is_property_of specializations (#9652) This commit adds specializations of is_property_of for property::image::use_host_ptr, property::image::use_mutex, and property::image::context_bound with unsampled_image and sampled_image. Likewise, this commit adds specializations of is_property_of for property::no_init with unsampled_image_accessor, sampled_image_accessor, host_unsampled_image_accessor and host_sampled_image_accessor. Signed-off-by: Larsen, Steffen <[email protected]> commit c953fed97ef68a15efa82b5e211809be8695a0da Author: jinge90 <[email protected]> Date: Mon Jun 5 22:45:25 2023 +0800 [SYCL][libdevice] Add libdevice lit test to check 'double' usage for fp32 spirv file on-double spirv file(#9711) commit ef1f8462cf270a925b34b7c35da7e2f29c654355 Author: Steffen Larsen <[email protected]> Date: Mon Jun 5 13:29:14 2023 +0100 [SYCL][NFC] Remove unused parameter in preScreenAccessor (#9737) Addresses post-commit failure after https://github.com/intel/llvm/pull/9634 Signed-off-by: Larsen, Steffen <[email protected]> commit 0c0809590f378925415e6fd317867b8123aaf0e6 Author: mmoadeli <[email protected]> Date: Mon Jun 5 08:35:05 2023 +0100 [SYCL] Allow accessor constructed with zero-size buffers (#9634) * Allow accessor constructed with zero-size buffers. [Clarify behaviour for range of zero](https://github.com/KhronosGroup/SYCL-Docs/pull/192) * Remove existing error disallowing it. * Add test --------- Co-authored-by: Steffen Larsen <[email protected]> commit 2648b7c5e1a4af8e0ffded1c431b79813fb24777 Author: Artur Gainullin <[email protected]> Date: Fri Jun 2 16:10:24 2023 -0700 [SYCL] Rename win_proxy_loader to pi_win_proxy_loader (#9724) Co-authored-by: Dale <[email protected]> commit 4c5521c9edae675bff012c367cf53b457068f039 Author: aelovikov-intel <[email protected]> Date: Fri Jun 2 14:58:00 2023 -0700 [CI] Fix pre-commit job dependencies on Windows (#9727) Bug-fix for https://github.com/intel/llvm/pull/9709. commit 35171b3c360092299bd43b3ad10ba885254ed805 Author: Erich Keane <[email protected]> Date: Fri Jun 2 14:00:14 2023 -0700 Finish fixing 2nd SemaSYCl test due to diag change. My previous commit for SemaSYCL seemingly missed 1 spot, this patch fixes that one too. commit f874ec8410fd6bf94b996df0e32ca2087addcec5 Author: Erich Keane <[email protected]> Date: Fri Jun 2 13:45:22 2023 -0700 Fix 2 sycl tests: SemaSYCL/loop_fusion.cpp, SemaSYCL/fpga_pipes.cpp Two tests failed because the diagnostic message format changed, but emission of it was not updated. This patch corrects that. commit 12dd0ad040ea61f1201fa9d82efd5079ce7dc6ca Author: Byoungro So <[email protected]> Date: Fri Jun 2 11:39:53 2023 -0700 [SYCL] Free allocated memory to avoid memory leak (#9722) We just need to call free() to avoid memory leak. Signed-off-by: Byoungro So <[email protected]> commit c6500e41fdc02545ae1867e9c3a868734ecc62c2 Author: aelovikov-intel <[email protected]> Date: Fri Jun 2 08:03:34 2023 -0700 Revert "[SYCL][CI] Cancel in-progress pre_commit job when PR is updated (#9706)" (#9721) This reverts commit 1db96de9f9b394fbed0b8953849108f255dd31d7. CI seems to be stuck after this PR has been merged. commit 11ac7300305669b6e23bbac03c8c1fe0214cac8e Author: Justin Cai <[email protected]> Date: Fri Jun 2 00:28:50 2023 -0700 [SYCL] Add support for scalar logical operators with group algorithms (#9298) commit e33c2f666e3b5fc873c23e08963ca71c5fc39509 Author: tovinkere <[email protected]> Date: Thu Jun 1 23:49:09 2023 -0700 [XPTI] CMakeFiles fix to support independent build of XPTI (#9262) There have been requests from tools implementors to be able to independently build XPTI proxy library and the existing CMakeFiles.txt has issues that prevent this and needed to be addressed. --------- Signed-off-by: Vasanth Tovinkere <[email protected]> commit e45834c363d0c26d9c461455ea9654fb1ff947eb Author: rdeodhar <[email protected]> Date: Thu Jun 1 23:47:14 2023 -0700 [SYCL] [L0] Test adjustment for Windows (#9658) Explicitly enable a default context so that all queues use that context and immediate command list recycling happens as expected, commit 260182a1ad758994a652b4241bbe22f6f13cc003 Author: Jaime Arteaga <[email protected]> Date: Thu Jun 1 20:36:03 2023 -0700 [SYCL][UR][L0] Clean up events on queue wait (#9643) After the last command in an in-order queue has completed, clean up the rest of the events so they are available for later reuse. Signed-off-by: Jaime Arteaga <[email protected]> commit a4283b33744d095743015f44a90afd003c2564ae Author: aelovikov-intel <[email protected]> Date: Thu Jun 1 20:32:20 2023 -0700 [SYCL][CI][WIN] Skip some checks depending on what files have changed (#9709) Follow-up for #9589 implementing the same as it on Windows. commit 1db96de9f9b394fbed0b8953849108f255dd31d7 Author: aelovikov-intel <[email protected]> Date: Thu Jun 1 20:30:57 2023 -0700 [SYCL][CI] Cancel in-progress pre_commit job when PR is updated (#9706) commit 66b2e89172001c8e9bc60f402b811e7b41e43e0a Author: aelovikov-intel <[email protected]> Date: Thu Jun 1 20:20:52 2023 -0700 [SYCL][CI] Improve compression performance (#9675) This was originally implemented in https://github.com/intel/llvm/pull/5678. Start with Linux only for now. Benchmarking several compression utilities for time/size: | | Pack time | Upload time | Size | | ------- | --------- | ----------- | ------ | | xz | 5m 20s | 1m 30s | 350 MB | | lz4 | 3s | 3m 10s | 660 MB | | zstd -9 | 25s | 2m 4s | 467 MB | The difference in size between xz/lz4 would result in 1m30s -> 3m increase in artifacts upload time so the pack time gain would be partially offset by that. I don't see a way to get data about unpack from the CI, but locally on a different machine (and likely with a different build) I had this: | | Pack time | Unpack time | | ---- | --------- | ----------- | | xz | 28m 30s | 1m 13s | | lz4 | 11s | 6s | | zstd | 1m 22s | 8s | Based on the data above we're switching to use `zstd -9` as our compression algorithm. commit 856ad1d77927ddef77a2a8e6ec5ed43eeb4b75eb Author: fineg74 <[email protected]> Date: Thu Jun 1 15:46:40 2023 -0700 [ESIMD][E2E] Temporarily disable -ffast-math option for 7 LIT tests (#9660) This PR is a work around for tests failing when compiled with icpx and succeeding when compiled with clang++. The root cause of that behavior is fast-math option that is enabled by default when using icpx and disabled by default when using clang++. As a work around the affected tests will be compiled with no-fast-math option. commit 43d20039920ee187b379781188148fd4cccf6786 Author: Nick Sarnie <[email protected]> Date: Thu Jun 1 18:20:09 2023 -0400 [SYCL][ESIMD][E2E] Fix ext_math_ieee_sqrt_div on emulator (#9680) Similar to the other ext_math tests, this needs -fno-fast-math as well. Signed-off-by: Sarnie, Nick <[email protected]> commit 27755824d050679127580ea7a7baf28cea38d91b Author: Nick Sarnie <[email protected]> Date: Thu Jun 1 17:45:43 2023 -0400 [SYCL][ESIMD] Fix gather/scatter with accessors when passing scalar (#9674) This regressed in https://github.com/intel/llvm/commit/d04ebb03c1c891077974622c99027a72bad34b71 when we added a template arg. Since we have a template arg, we won't also call the constructor. Signed-off-by: Sarnie, Nick <[email protected]> commit ac1c91e533ebffd8f0629c9c072ea91a807fcf0d Author: Kseniya Tikhomirova <[email protected]> Date: Thu Jun 1 21:44:31 2023 +0200 [SYCL] Fix post commit fail related to std::unique_lock CTAD in unit tests (#9698) Signed-off-by: Tikhomirova, Kseniya <[email protected]> commit 57187f6f14c1a9e9ed669bcfb2432f4ebfc90dbb Author: aelovikov-intel <[email protected]> Date: Thu Jun 1 09:36:36 2023 -0700 [SYCL][CI] Add zstd to our build image (#9681) I will remove unneeded package (lz4 or/and zstd) once we settle which one is the best for our use. commit 01d7fc097ec6b5e380db1a07b6caee475e1c695f Author: Maksim Sabianin <[email protected]> Date: Thu Jun 1 18:26:15 2023 +0200 [SYCL] Remove reduntant sycldevice support (#9653) commit 712138f6d84f45c14b7a6fb4dd1432a8b3aa1949 Author: aelovikov-intel <[email protected]> Date: Thu Jun 1 07:57:34 2023 -0700 [SYCL][CI] Create nightly container based on the "build" image (#9685) I plan to use it in post commit to merge two builds [linux_default](https://github.com/intel/llvm/blob/ac8408c4761180835fb23ccd5183efd5c5c37d95/.github/workflows/sycl_post_commit.yml#L26-L38) and [self_build](https://github.com/intel/llvm/blob/ac8408c4761180835fb23ccd5183efd5c5c37d95/.github/workflows/sycl_post_commit.yml#L39-L51) into one. I can also imagine how we can use that in place of [HIP/CUDA image for E2E tests](https://github.com/intel/llvm/blob/24955697d9f08c0bc7e1f2b80182c7d967f53b70/.github/workflows/sycl_gen_test_matrix.yml#L10-L17) for PRs that only update E2E tests. commit cebe7da1e21072d158c089e258a28ffe7e951a7a Author: jinz2014 <[email protected]> Date: Thu Jun 1 10:43:06 2023 -0400 [SYCL][HIP] Display the backend name in intel-ext-device.cpp (#9688) commit 54fcf80f2351a75281f627f9d80b9a86e686c6fc Author: Kseniya Tikhomirova <[email protected]> Date: Thu Jun 1 16:38:40 2023 +0200 [SYCL] Fix and reenable unit test for xpti_trace (#9587) Signed-off-by: Tikhomirova, Kseniya <[email protected]> commit 1dce70f413e686bc6fe3af30f99f478c954ee35f Author: Justin Cai <[email protected]> Date: Thu Jun 1 06:15:30 2023 -0700 [SYCL] Enable proper behavior of optional kernel features with SYCL_EXTERNAL (#9611) Currently, the code generated from a translation unit with a declaration of a `SYCL_EXTERNAL` function with a `[[sycl::device_has(...)]]` attribute, but with no definition of that function, is a LLVM module with a declaration of the function but with no `sycl_declared_aspects` metadata. Because of this, `SYCLPropagateAspectsPass` does not propagate any used aspect information to functions that (transitively) call a `SYCL_EXTERNAL` function. This causes `sycl-post-link` to fail to split kernels that call `SYCL_EXTERNAL` functions with different required aspects. With this PR, the `sycl_declared_aspects` metadata is now attached to a `SYCL_EXTERNAL` function even if there is no definition (in the same translation unit). Additionally, `SYCLPropagateAspectsPass` now collects aspects information for function declarations. commit 1bae4b76f88bdee7c37d6f11b75cefe6f1a494eb Author: Sven van Haastregt <[email protected]> Date: Wed May 31 12:47:07 2023 +0100 Use clang to generate compile_commands (#2031) Ensure the code-formatting job uses clang to generate compile_commands.json, to avoid passing GCC-specific flags to clang-format or clang-tidy. Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/c2ff406 commit 353f349fa7f689963f4cc59faa710c290522650e Author: Nick Sarnie <[email protected]> Date: Tue May 30 06:42:59 2023 -0400 Skip spirv decoration metadata with --spirv-preserve-auxdata (#2013) It's already explicitly handled for forward and reverse translation, and it's a bit complicated to handle MDNode metadata. Just skip it so we don't assert. If I see this come up in more cases I will add support for MDNode metadata. Signed-off-by: Sarnie, Nick <[email protected]> Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/89d658c commit 23a3ea0775149b04daba041de55c150785d2f101 Author: Dmitry Sidorov <[email protected]> Date: Sun May 28 18:41:04 2023 +0200 Relax consumer checks for checksum info (#2011) It's a follow up for https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/1996 since I couldn't update the PR Signed-off-by: Sidorov, Dmitry <[email protected]> Original commit: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/8cbf726 commit e4ad410f1eeb38659f959bca24d74547e8871274 Merge: fdd609a5c724 d9a9f60248dc Author: sys_ce_bb <[email protected]> Date: Thu Jun 1 06:04:54 2023 -0700 Merge remote-tracking branch 'origin/sycl-web' into llvmspirv_pulldown commit fdd609a5c724a69e24ac1a80fdea6b34714660c0 Author: Kseniya Tikhomirova <[email protected]> Date: Thu Jun 1 12:05:43 2023 +0200 [SYCL][ABI-break] Add code_location parameter to the rest of sycl::queue methods (#9603) code_location helps to improve error reporting and allow to detect exact code lines for failed command submission. --------- Signed-off-by: Tikhomirova, Kseniya <[email protected]> commit 7618dffd78ae8456df9885c35d200604748233ec Author: mmoadeli <[email protected]> Date: Thu Jun 1 09:03:48 2023 +0100 [SYCL] Lost data during implicit conversion in local and host accessors. (#9669) * Fix local_accessor and host_accessor lost data during implicit conversion. * Add relevant test. commit 4eaaaa963ca2f58358ea0897d30374cf9928b80b Author: Kseniya Tikhomirova <[email protected]> Date: Thu Jun 1 10:03:33 2023 +0200 [SYCL] Enable xpti::node_create signal emit for parallel_for that bypasses graph (#9565) xpti::node_create signal is emitted when we create new node in graph. Code related to it is present in Command::emitInstrumentationData and Command successors. Although we have a path when no memory dependencies is tracked for kernel (e.g. queue::parallel_for) and to speed up kernel enqueue and eliminate extra overhead - node is not added to graph (and related Command is not created too). This commit adds this node_create signal to be emitted in this case. --------- Signed-off-by: Tikhomirova, Kseniya <[email protected]> commit 9e5889918277e921ef8c4724fe22ab6d638fdfb4 Author: Vyacheslav Klochkov <[email protected]> Date: Wed May 31 22:41:15 2023 -0500 [ESIMD][DOC] Update description of accessor-based memory APIs (#9582) ESIMD has got support of `local accessor`, methods `get_pointer()` and `operator[]` of accessor class, new `slm_allocator` class to reserve extra SLM for local needs. Also, this patch described some existing restrictions for `slm_init` function --------- Signed-off-by: Vyacheslav N Klochkov <[email protected]> commit d3aaccc7561b3664fb2a039f6a32629c65fc9d05 Author: aelovikov-intel <[email protected]> Date: Wed May 31 16:05:55 2023 -0700 [SYCL][CI] Skip some checks depending on what files have changed (#9589) I'm using https://github.com/dorny/paths-filter to implement it. I decided to call it from `sycl_precommit.yml` so that we can potentially re-use its results between Linux/Windows tasks but that might have its own drawbacks. I don't see a possibility to just pass the result of the job between workflows (`sycl_precommit` -> `sycl_linux_build_and_test`) which means that for every value I have to thread it carefully via latter's inputs. That might complicate things in future if we'd want to run just the modified end-to-end tests instead of all of them. Another approach would be to run the job inside `sycl_linux_build_and_test` so that I'd have immediate access to its output from anywhere in the workflow. commit f110fd73f8e7e51d3b0eb0595162f129ea74cb21 Author: Byoungro So <[email protected]> Date: Wed May 31 15:46:46 2023 -0700 [SYCL] Avoid unnecessary kernel retain (#9557) We should retain the kernel only for OpenCL backend. Signed-off-by: Byoungro So <[email protected]> commit ac8408c4761180835fb23ccd5183efd5c5c37d95 Author: Joshua Cranmer <[email protected]> Date: Wed May 31 17:47:32 2023 -0400 [SYCL][OpaquePtrs] Convert some sycl tests to opaque pointers. (#9536) This does not fix all of the lit tests that fail with opaque pointers enabled, but it does fix those where the test is looking for IR whose form has changed with opaque pointers enabled. commit 24955697d9f08c0bc7e1f2b80182c7d967f53b70 Author: Dmitry Vodopyanov <[email protected]> Date: Wed May 31 21:03:56 2023 +0200 [SYCL] Revert regression for atomic64 after #9561 (#9625) Fixes regression introduced in https://github.com/intel/llvm/pull/9561 by reverting the affected code commit d9a9f60248dc73b975e19c634cf6790db0473bf0 Merge: 182ec5bb2718 a88f496f8f3b Author: Gainullin, Artur <[email protected]> Date: Wed May 31 14:30:11 2023 -0400 Merge from 'main' to 'sycl-web' (54 commits) CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp commit 182ec5bb2718e2676a616fc5a0ceaf2a339b50ff Merge: 6532d2ee8b34 f9b489c7a88b Author: iclsrc <[email protected]> Date: Wed May 31 10:53:04 2023 -0700 Merge from 'sycl' to 'sycl-web' (6 commits) commit f9b489c7a88b3b130f22678de79d5cf4f00d6b2c Author: aelovikov-intel <[email protected]> Date: Wed May 31 10:10:06 2023 -0700 [SYCL][CI] Add lz4 to our build image (#9677) commit 6532d2ee8b347a4f1e3c4db29229822e2f2865be Merge: 916980317aa1 33ee5c466346 Author: Gainullin, Artur <[email protected]> Date: Wed May 31 12:57:09 2023 -0400 Merge from 'main' to 'sycl-web' (82 commits) CONFLICT (content): Merge conflict in clang/lib/Sema/SemaDeclAttr.cpp CONFLICT (content): Merge conflict in clang/lib/Sema/SemaType.cpp commit b793a58559a21d89b2c6ef9a3ad2953597be3e17 Author: Jaime Arteaga <[email protected]> Date: Wed May 31 09:31:06 2023 -0700 [SYCL][UR][L0] Fix unused parameter (#9670) Signed-off-by: Jaime Arteaga <[email protected]> commit 06ed924eb112a001c7397c5fcee0b8a8f4ed08dd Author: JackAKirk <[email protected]> Date: Wed May 31 17:05:29 2023 +0100 [SYCL][CUDA] Check make_device doesn't create duplicate sycl::device (#9373) Check make_device doesn't create duplicate sycl::device. Migration of https://github.com/intel/llvm-test-suite/pull/1419 Tests https://github.com/intel/llvm/pull/7550. Checks that make_device doesn't return a duplicate sycl::device if one already exists. Signed-off-by: JackAKirk <[email protected]> commit a88f496f8f3baa6c3b15532e37e3bdbb1c4ea0d0 Author: Kazu Hirata <[email protected]> Date: Wed May 31 08:59:35 2023 -0700 [Sema] Remove unused function getFloat128Identifier The last use was removed by: commit bb1ea2d6139a72340b426e114510c46d938645a6 Author: Nemanja Ivanovic <[email protected]> Date: Mon May 9 08:52:33 2016 +0000 Differential Revision: https://reviews.llvm.org/D151608 commit 8e728adcfedd97fbc3759b5533d0cbada6b68aa6 Author: Marco Elver <[email protected]> Date: Wed May 31 17:57:07 2023 +0200 Revert "[compiler-rt] Avoid memintrinsic calls inserted by the compiler" This reverts commit 4369de7af46605522bf7dbe3bc31d00b0eb4bee6. Fails on Mac OS with "sanitizer_libc.cpp:109:5: error: aliases are not supported on darwin". commit fc8acb563ae019735e646f9964b254cab1efd529 Author: Caroline Concatto <[email protected]> Date: Wed May 31 14:12:08 2023 +0000 [Clang][SVE2.1] Add clang support for builtins using svcount_t In this patch it is used for the prototype: * svptrue_c8 (and _c16/_c32/_c64) As described in: https://github.com/ARM-software/acle/pull/257 Patch by: Sander de Smalen <[email protected]> Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D150953 commit 71d5a94985c9569467c1ef8a62b8b326ee2036a6 Author: Peter Klausler <[email protected]> Date: Thu May 25 16:01:52 2023 -0700 [flang] Don't fold SIZE()/SHAPE() into expression referencing optional dummy arguments When computing the shape of an expression at compilation time as part of folding an intrinsic function like SIZE(), don't create an expression that increases a dependence on the presence of an optional dummy argument. Differential Revision: https://reviews.llvm.org/D151737 commit 660e4530124356442ff63d61b1f6dcb9c1def7e6 Author: Nikita Popov <[email protected]> Date: Wed May 31 10:10:47 2023 +0200 [KnownBits] Also test 1-bit values in exhaustive tests (NFC) Similar to what we do with ConstantRanges, also test 1-bit values in exhaustive tests, as these often expose special conditions. This would have exposed the assertion failure fixed in D151788 earlier. commit 6eef8d9b2bbfdb3920b6eeafc939a2d62ad5295b Author: Kazu Hirata <[email protected]> Date: Wed May 31 08:45:29 2023 -0700 [RISCV] Fix an unused variable warning llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:3793:7: error: unused variable 'XLenVT' [-Werror,-Wunused-variable] commit d6a36619cec44d02a2a3526eceb2ac128d90e030 Author: Simon Pilgrim <[email protected]> Date: Wed May 31 15:33:44 2023 +0100 [X86] X86FixupVectorConstantsPass - use VBROADCASTSS/VBROADCASTSD for integer vector loads on AVX1-only targets Matches behaviour in lowerBuildVectorAsBroadcast commit f29f1c7e23d555c95a199f8e77fefe87e91664cf Author: Mark de Wever <[email protected]> Date: Sun May 28 14:23:12 2023 +0200 [libc++]{CI] Bumps clang-tidy version used. The CI can no longer run with clang-tidy 16 increment it to version 17. Whether permanently moving to the latest development version is being discussed on Discourse. Depends on D149455 Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D151628 commit cf64668b8c414c60aec12cdd7374ea053fc99411 Author: Mark de Wever <[email protected]> Date: Fri Apr 28 17:38:47 2023 +0200 [libc++][test] Prefers the newer clang-tidy version. Module require Clang 17, since Clang 16 requires the magic # __FILE__ line. Therefore, if available, use clang-tidy 17 too. This change should be reverted after LLVM 17 is released. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D149455 commit 5d4281d5493c7a2fc09d9ac9fc5b374676a4d8af Author: Mark de Wever <[email protected]> Date: Thu May 25 21:59:25 2023 +0200 [libc++] Gives ignore external linkage. A slightly different fix is in D144994. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D151490 commit ac7d60f73a4a369fb4dcce734d54cb38fde80981 Author: Mark de Wever <[email protected]> Date: Tue May 23 17:14:20 2023 +0200 [libc++] Fixes use-after move diagnostic. The diagnostic is issued by clang-tidy 17. This just suppressed the diagnostic. The move operations are non-standard extensions and the class itself is deprecated. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D151223 commit 7578672c96e18feb5982192e595459b2a65867cf Author: Dave Lee <[email protected]> Date: Sat May 20 10:05:44 2023 -0700 [lldb] Override GetVariable in ValueObjectSynthetic (NFC) Make `GetVariable` a passthrough function the the underlying value object in `ValueObjectSynthetic`. Differential Revision: https://reviews.llvm.org/D151384 commit 42e98c6ae875e952ee852f78234c0f8ed311472b Author: Nikita Popov <[email protected]> Date: Wed May 31 10:16:16 2023 +0200 [APInt] Support zero-width extract in extractBitsAsZExtValue() D111241 added support for extractBits() with zero width. Extend this to extractBitsAsZExtValue() as well for consistency (in which case it will always return zero). Differential Revision: https://reviews.llvm.org/D151788 commit 3825910c7316cf62549bd31c503c48e7526adcc2 Author: Nico Weber <[email protected]> Date: Wed May 31 11:12:32 2023 -0400 [gn] port 4369de7af466 commit cb463c34dd4c3ad2ac6c13f98edcf684a3fcbe38 Author: Dave Lee <[email protected]> Date: Fri May 26 21:19:10 2023 -0700 [lldb] Take StringRef name in GetChildMemberWithName (NFC) `GetChildMemberWithName` does not need a `ConstString`. This change makes the function take a `StringRef` instead, which alleviates the need for callers to construct a `ConstString`. I don't expect this change to improve performance, only ergonomics. This is in support of Alex's effort to replace `ConstString` where appropriate. There are related `ValueObject` functions that can also be changed, if this is accepted. Differential Revision: https://reviews.llvm.org/D151615 commit e0df106818ccb90dc46c5296ed5ef2eda75564ff Author: Paul Scoropan <[email protected]> Date: Tue May 30 15:07:44 2023 +0000 [Flang] Move several definitions to IntrinsicCall header for code cleanliness and reusability In the future we intend to add support for many PowerPC-specific intrinsics that ideally will exist in a separate new PPCIntrinsicCall file. But first we need to move definitions to the IntrinsicCall header file to increase code cleanliness and readability and to make code reusable for when we add PPCIntrinsicCall. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D151715 commit 572cfa3fde5433c889b339e9cfa6dfaa23e5f2ee Author: Florian Hahn <[email protected]> Date: Wed May 31 16:00:57 2023 +0100 [LV] Use SCEV for uniformity analysis across VF This patch uses SCEV to check if a value is uniform across a given VF. The basic idea is to construct SCEVs where the AddRecs of the loop are adjusted to reflect the version in the vectorized loop (Step multiplied by VF). We construct a SCEV for the value of the vector lane 0 (offset 0) compare it to the expressions for lanes 1 to the last vector lane (VF - 1). If they are equal, consider the expression uniform. While re-writing expressions, we also need to catch expressions we cannot determine uniformity (e.g. SCEVUnknown). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D148841 commit 4369de7af46605522bf7dbe3bc31d00b0eb4bee6 Author: Marco Elver <[email protected]> Date: Tue May 30 11:59:22 2023 +0200 [compiler-rt] Avoid memintrinsic calls inserted by the compiler D135716 introduced -ftrivial-auto-var-init=pattern where supported. Unfortunately this introduces unwanted memset() for large stack arrays, as shown by the new tests added for asan and msan (tsan already had this test). In general, the problem of compiler-inserted memintrinsic calls (memset/memcpy/memmove) is not new to compiler-rt, and has been a problem before. To avoid introducing unwanted memintrinsic calls, we redefine memintrinsics as __sanitizer_internal_mem* at the assembly level for most source files automatically (where sanitizer_common_internal_defs.h is included). In few cases, redefining a symbol in this way causes issues for interceptors, namely the memintrinsic interceptor themselves. For such source files we have to selectively disable the redefinition. Other alternatives have been considered, but simply do not work well in the context of compiler-rt: 1. Linker --wrap: this does not work because --wrap only applies to the final link, and would not apply when building sanitizer static libraries. 2. Changing references to memset() via objcopy: this may work, but due to the complexities of the build system, introducing such a post-processing step for the right object files (in particular object files defining memset cannot be touched) seems infeasible. The chosen solution works well (as shown by the tests). Other libraries have chosen the same solution where nothing else works (see e.g. glibc's "symbol-hacks.h"). v2: - Fix ubsan_minimal build where compiler decides to insert memset/memcpy: ubsan_minimal has work without RTSanitizerCommonLibc, therefore do not redefine the builtins. - Fix definition of internal_mem* functions with compilers that want the aliased function to already be defined before. - Fix definition of __sanitizer_internal_mem* functions with compilers more pedantic about attribute placement around extern "C". Reviewed By: vitalybuka, dvyukov Differential Revision: https://reviews.llvm.org/D151152 commit 26d7b7bb8ff982b6cdcd9bf7538405356135b724 Author: Michael Liao <[email protected]> Date: Fri May 26 12:58:12 2023 -0400 [TableGen] Add !getdagarg and !getdagname - This patch proposes to add `!getdagarg` and `!getdagname` bang operators as the inverse operation of `!dag`. They allow us to examine arguments of a given dag. Reviewed By: simon_tatham Differential Revision: https://reviews.llvm.org/D151602 commit e69318138e6cc88becbb8d095b1d2dcf76ac45e1 Author: Philip Reames <[email protected]> Date: Wed May 31 07:48:17 2023 -0700 [RISCV] Use v(f)slide1down for shuffle+insert idiom This is a follow up to D151468 which added the vslide1down case as a sub-case of vslide1down matching. This generalizes that code into generic mask matching - specifically to point out the sub-vector insert restriction in the original patch. Since the matching logic is basically the same, go ahead and support vslide1up at the same time. Differential Revision: https://reviews.llvm.org/D151742 commit 5442264744f4e6f925bcb06ae60687ec3c2e9d7f Author: Nikita Popov <[email protected]> Date: Wed May 31 16:39:41 2023 +0200 [InstCombine] Name instructions in test (NFC) commit 66b9e114326462eb4a7b67dccf36cca875b8791b Author: myl <[email protected]> Date: Wed May 31 22:33:07 2023 +0800 Temporarily add explicit '-O2' for Basic/image/image_read*.cpp to avoid GPU hang issue with O0 optimization. (#9664) commit 6ef3efc9c46591e94165533f461ac5a17adc527d Author: aelovikov-intel <[email protected]> Date: Wed May 31 07:32:48 2023 -0700 [SYCL][CI] Fuse self-build and no-asserts build (#9655) Co-authored-by: Alexey Bader <[email protected]> commit f9b523ebc367f1535bf61797383471e567b24b75 Author: Kazu Hirata <[email protected]> Date: Wed May 31 07:30:14 2023 -0700 [Analysis] Remove unused class LegacyAARGetter The last use was removed by: commit fa6ea7a419f37befbed04368bcb8af4c718facbb Author: Arthur Eubanks <[email protected]> Date: Mon Mar 20 11:18:35 2023 -0700 Once we remove it, createLegacyPMAAResults and createLegacyPMAAResults become unused, so this patch removes them as well. Differential Revision: https://reviews.llvm.org/D151787 commit 8634b43a03945971c2939833ac686728bee5a760 Author: Fangrui Song <[email protected]> Date: Wed May 31 07:19:44 2023 -0700 [ELF][RISCV] --wrap=foo: Correctly update st_value(foo) With --wrap=foo, we may have `d->file != file` for a defined symbol `foo`. For the object file defining `foo`, its symbol table may not contain `foo` after `redirectSymbols` changed the `foo` entry to `__wrap_foo` (see D50569). Therefore, skipping `foo` with the condition `if (!d || d->file != file)` may cause `__wrap_foo` not to be updated. See `ab.o w.o --wrap=foo` in the new test (originally reported by D150220). We could adjust the condition to `if (!d)`, but that would leave many `anchors` entries if a symbol is referenced by many files. Switch to iterating over `symtab` instead. Note: D149735 (actually not NFC) allowed duplicate `anchors` entries and fixed `a.o bw.o --wrap=foo`. Reviewed By: jobnoorman Differential Revision: https://reviews.llvm.org/D151768 commit e9c9d54cf5959fa020cf76e47ced4575793f6d60 Author: Vyacheslav Klochkov <[email protected]> Date: Wed May 31 09:16:30 2023 -0500 [ESIMD][LIT] Fix usage of -fno-fast-math and -fno-slp-vectorize with cl (#9661) clang-cl driver does not understand -fno-fast-math and -fno-slp-vectorize. Usage of those options requires adding "/clang:" before the option. Signed-off-by: Vyacheslav N Klochkov <[email protected]> commit 408f4196ba4ac66328ebfcf41cb372572257c4f6 Author: Tom Eccles <[email protected]> Date: Wed May 17 16:07:41 2023 +0000 [flang] use greedy mlir driver for stack arrays pass In upstream mlir, the dialect conversion infrastructure is used for lowering from one dialect to another: the passes are of the form XToYPass. Whereas, transformations within the same dialect tend to use applyPatternsAndFoldGreedily. In this case, the full complexity of applyPatternsAndFoldGreedily isn't needed so we can get away with the simpler applyOpPatternsAndFold. This change was suggested by @jeanPerier The old differential revision for this patch was https://reviews.llvm.org/D150853 Re-applying here fixing the issue which led to the patch being reverted. The issue was from erasing uses of the allocation operation while still iterating over those uses (leading to a use-after-free). I have added a regression test which catches this bug for -fsanitize=address builds, but it is hard to reliably cause a crash from the use-after-free in normal builds. Differential Revision: https://reviews.llvm.org/D151728 commit 543705641adb1d3533be141947264ca1b7b04479 Author: Paul Robinson <[email protected]> Date: Wed May 31 06:43:27 2023 -0700 [Headers][doc] Fix typo in avx2intrin.h doc commit f6a631d4060c5b539fd51b7221205ee05ec50ee8 Author: Jan Sjodin <[email protected]> Date: Tue May 30 14:28:12 2023 -0500 [MLIR] Remove dependency on omp dialect in LLVM dialect. This fixes a buildbot failure where the dependency on the omp dialect in the LLVM dialect caused error. Instead of accessing the interface defined in the omp dialect we directly access the attributes instead. To make this work the IsDeviceAttr is removed and replaced with a BoolAttr instead. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D151745 commit e5399f1d7cabfca90030ca03f52818e892aa389f Author: Paul Robinson <[email protected]> Date: Tue May 30 13:30:12 2023 -0700 [Headers][doc] Add shuffle-like intrinsic descriptions to avx2intrin.h Differential Revision: https://reviews.llvm.org/D151749 commit 0a3dc73e700b4a37bc435bf7c02213161b27f54a Author: Dmitry Makogon <[email protected]> Date: Wed May 31 20:23:19 2023 +0700 [Test] Move LoopStrengthReduce/pr62563.ll to X86 specific test folder (NFC) The test case is X86 specific. Should unblock buildbots after 253e3e2. commit 6bcbb3af059b05056c7343cafd99004d4cd4cd35 Author: Florian Hahn <[email protected]> Date: Wed May 31 14:22:44 2023 +0100 [ConstraintElim] Move logic to remove stack entry to helper (NFC). Preparation for follow-up patch that uses the logic in a separate place. commit 97f0e7b06e6b76fd85fb81b8c12eba2255ff1742 Author: Nikita Popov <[email protected]> Date: Wed May 31 14:53:44 2023 +0200 [AA] Fix comparison of AliasResults (PR63019) Comparison between two AliasResults implicitly decayed to comparison of AliasResult::Kind. As a result, MergeAliasResults() ended up considering two PartialAlias results with different offsets as equivalent. Fix this by adding an operator== implementation. To stay compatible with extensive use of comparisons between AliasResult and AliasResult::Kind, add an overload for that as well, which will ignore the offset. In the future, it would probably be a good idea to remove these implicit decays to AliasResult::Kind and add dedicated methods to check for specific AliasResult kinds. Fixes https://github.com/llvm/llvm-project/issues/63019. commit 4d64ffa94170eadd79954e2a5f13d1f1d16e9e2c Author: Nikita Popov <[email protected]> Date: Wed May 31 14:55:11 2023 +0200 [GVN] Add test for PR63019 (NFC) commit ce97312d109b21acb97d3ea243e214f20bd87cfc Author: Arnaud Bienner <[email protected]> Date: Wed May 31 10:54:27 2023 +0200 Implement BufferOverlap check for sprint/snprintf Differential Revision: https://reviews.llvm.org/D150430 commit 916980317aa18cd55727feae689026d4bd5a23e2 Merge: 606c74d747f2 0000fa6a925e Author: iclsrc <[email protected]> Date: Wed May 31 05:37:05 2023 -0700 Merge from 'sycl' to 'sycl-web' commit 0b42ee46b06fb9fb396eca8b335166d8e92b70cd Author: LLVM GN Syncbot <[email protected]> Date: Wed May 31 12:30:10 2023 +0000 [gn build] Port 26bda9e95a9d commit dd2fea9c23e6dabd83d3f4ee7d000ceb16cace55 Author: Thorsten Schütt <[email protected]> Date: Thu May 25 17:47:00 2023 +0200 [GlobalIsel][X86] Legalize G_CTLZ and G_CTPOP for 32-bit Note that 32-bit support is very limited Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D151459 commit 344e91a6f00840e67fc03bcfeca6c34fa6d34b17 Author: Nico Weber <[email protected]> Date: Wed May 31 08:17:44 2023 -0400 [gn] port 301eb6b68f3 (AttrTokenKinds.inc) commit 64bd5bbb9bbb72de5f59755c74dae4b4881d93d5 Author: rikhuijzer <[email protected]> Date: Wed May 31 14:13:08 2023 +0200 [mlir] Avoid tensor canonicalizer crash on negative dimensions Fixes #59703. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D151611 commit c76a3e795ef6bd5262b5860ebcc902fab3fab607 Author: Guillaume Chatelet <[email protected]> Date: Wed May 31 12:06:45 2023 +0000 [libc][NFC] Fixing various typos commit 0000fa6a925ef8d0fcd97c1765a7f24b85110610 Author: JackAKirk <[email protected]> Date: Wed May 31 13:02:04 2023 +0100 [SYCL][CUDA] opportunistic_group, fixed_size_group, and ballot_group impls. (#9280) This basic cuda support does not include any algorithm support. Algorithm support will follow in a later PR. Since all intel ba…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below, but LGTM otherwise. Once the SME2 stuff is in, I think we should consolidate the intrinsics that are common between SME2 and SVE2p1, rather than duplicating them. I agree the current form makes sense until then though.
main/acle.md
Outdated
// _u64base_u8, _u64base_u16, _u64base_s16, _u64base_u32, _u64base_s32, | ||
// _u64base _u64, _u64base_s64 | ||
// _u64base_bf16, _u64base_f16, _u64base_f32, _u64base_f64 | ||
svint8_t svld1q_gather[_u64base_s8](svbool_t pg, svint64_t zn, const void *rm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide the same addressing modes as for LDNT1 gather:
svld1q_gather[_u64base]_xx(svbool_t pg, svuint64_t zn)
(notesvuint64_t
rather thansvint64_t
)svld1q_gather[_u64base]_offset_xx(svbool_t pg, svuint64_t zn, int64_t offset)
svld1q_gather[_u64base]_index_xx(svbool_t pg, svuint64_t zn, int64_t index)
svld1q_gather_[u64]offset[_xx](svbool_t pg, const xx_t *base, svuint64_t offset)
svld1q_gather_[u64]index[_xx](svbool_t pg, const xx_t *base, svuint64_t index)
for 16-bit, 32-bit and 64-bitxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine we should do the same for the ST1Q scatter quadrword, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, same thing there.
main/acle.md
Outdated
// Variants are also available for: | ||
// _s8 _u16, _s16, _u32, _s32, _u64, _s64 | ||
// _bf16, _f16, _f32, _f64 | ||
void svst2q[_u8](svbool_t pg, uint8_t *rn, svuint8x2_t zt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolineConcatto Is there a reason why the pointers for the structured quad-word stores use uint8_t *, instead of the int8_t * for the svld2q, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is meant to vary with the suffix, so it's uint8_t *
for the [_u8]
function shown, and would be int8_t *
for the [_s8]
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! Of course, silly me. :)
main/acle.md
Outdated
|
||
#### LD1Q | ||
|
||
Gather Load Quadword. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only an unscaled variant of this instruction, so maybe don't have both offset and index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other SVE load and store intrinsics, we tried to provide a consistent interface and set of addressing modes. So the deciding factor wasn't so much whether the call mapped to a single instruction, but whether the underlying instruction could easily emulate the mode. “Single instruction” is a bit of nebulous concept anyway for loads and stores, since a single C address expression might need several operations to compute.
Since scaling is just a shift left, I think it's worth providing both index and offset variants.
main/acle.md
Outdated
|
||
#### ST1Q | ||
|
||
Scatter store quadwords. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only an unscaled version of this instruction? So maybe don't have both index and offset?
main/acle.md
Outdated
|
||
// Variants are also available for: | ||
// _s8, _u16, _s16, _u32, _s32, _u64, _s64 | ||
svuint8_t svpmov_lane_u8_z(svbool_t pn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ svuint8_t svpmov_lane_u8_z(svbool_t pn);/ svuint8_t svpmov_u8_z(svbool_t pn);/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the increased use of x4 vectors in 2.1, would it be the right time to introduce svreinterpret variants for x4 types as well?
With data rearranging, load/storing and element wise bit manipulation changing element size can come in quite handy.
As described in: ARM-software/acle#257 Patch by : David Sherwood <[email protected]> Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D150961
As described in: ARM-software/acle#257 Reviewed By: hassnaa-arm Differential Revision: https://reviews.llvm.org/D151081
As described in: ARM-software/acle#257 Patch by : Sander de Smalen<[email protected]> Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151197
As described in: ARM-software/acle#257 Patch by : Sander de Smalen<[email protected]> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151199
As described in: ARM-software/acle#257 Patch by : David Sherwood <[email protected]> Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151307
Patch by : David Sherwood <[email protected]> As described in: ARM-software/acle#257 Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D151433
As described in: ARM-software/acle#257 Patch by: David Sherwood <[email protected]> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151439
As described in: ARM-software/acle#257 Patch by: Kerry McLaughlin <[email protected]> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D151461
As described in: ARM-software/acle#257 Patch by: Rosie Sumpter <[email protected]> Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D151709
This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]ARM-software/acle#257 Co-author: Dinar Temirbulatov <[email protected]>
This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]ARM-software/acle#257
This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]ARM-software/acle#257
main/acle.md
Outdated
@@ -11829,7 +11829,7 @@ Extract vector segment from each pair of quadword segments. | |||
// Variants are also available for: | |||
// _s8, _s16, _u16, _s32, _u32, _s64, _u64 | |||
// _bf16, _f16, _f32, _f64 | |||
svuint8_t svextq_lane[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm); | |||
svuint8_t svextq[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping the _lane
part here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Richard pointed out that the other ext do not have lane in it.
// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64
// _bf16, _f16, _f32, _f64
svuint8_t svextq_lane[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm);
Member
@rsandifo-arm rsandifo-arm 3 weeks ago
I'm not sure these should be lane intrinsics. The instructions are really a form of permutation. (FWIW, the corresponding non-Q intrinsics don't have the _lane suffix.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
…1] (#76844) This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]ARM-software/acle#257
…1] (llvm#76844) This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]ARM-software/acle#257
Hello @CarolineConcatto, You have forgotten DUPQ instruction for sve2p1 . Prototype will look like this :
This is different to svdupq_lane intrinsic and they have different behaviour |
I merged SVE2.1 and SME2 intrinsics to 1 section. But I am not sure that is the best. |
This patch adds new intrinsics and types for supporting SVE2.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version seems to add the shared SVE2.1/SME intrinsics back into the SME section (with __arm_streaming
attributes). Is that deliberate?
I think we should only document each intrinsic once, as in the previous version. It's just that the relationship between streaming/non-streaming/streaming-compatible and SME/SME2/SVE2/SVE2.1 can't be expressed directly using attributes (and so needs to be specified in words instead).
No, they should not be in the SME section with streaming attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from the typo below.
main/acle.md
Outdated
@@ -8776,8 +8776,8 @@ The functions in this section are defined by the header file | |||
Some instructions overlap with the SME and SME2 architecture extensions and | |||
are additionally available in Streaming SVE mode when __ARM_FEATURE_SME is | |||
non-zero or __ARM_FEATURE_SME2 are non-zero. | |||
For convenience, these the intrinsics for these instructions are listed in | |||
the following section. | |||
For convenience, the intrinsics fo these instructions are listed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For convenience, the intrinsics fo these instructions are listed in the | |
For convenience, the intrinsics for these instructions are listed in the |
… scatter stores This patch adds the quadword gather load intrinsics of the form (1) sv<type>_t svld1q_gather_u64index_<typ>(svbool_t, const <type>_t *, svuint64_t); (2) sv<type>_t svld1q_gather_u64base_index_<typ>(svbool_t, svuint64_t, int64_t); and the quadword scatter store intrinsics of the form (3) void svst1q_scatter_u64index_<typ>(svbool_t, <type>_t *, svuint64_t, sv<type>_t); (4) void svst1q_scatter_u64base_index_<typ>(svbool, svuint64_t, int64_t, sv<type>_t); (intrinsics (1) and (3) are currently missing the variants for non 64-bit sized base types, e.g. `int8_t` or `bfloat16_t`, etc). ACLE spec: ARM-software/acle#257
``` c // All the intrinsics below are [SVE2.1 or SME2] // Variants are also available for _u16[_s32]_x2 and _u16[_u32]_x2 svint16_t svqcvtn_s16[_s32_x2](svint32x2_t zn); ``` According to PR#257[1] [1]ARM-software/acle#257
…air (#75107) Add intrinsics of the form: svboolx2_t svwhile<cond>_b{8,16,32,64}_[{s,u}64]_x2([u]int64_t, [u]int64_t); and their overloaded variants as specified in ARM-software/acle#257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor comment.
In this patch it is used for the prototype: * svptrue_c8 (and _c16/_c32/_c64) As described in: ARM-software/acle#257 Patch by: Sander de Smalen <[email protected]> Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D150953
This patch adds new intrinsics and types for supporting SVE2.1. This patch depends on Pull-Request#217
(#217),
because some intrinsic in this specification are also in Pull-Request#217.
Depends on: #217
name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.
Thank you for submitting a pull request!
If this PR is about a bugfix:
Please use the bugfix label and make sure to go through the checklist below.
If this PR is about a proposal:
We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.
We would like to encourage you reading through the contribution
guidelines, in particular the section on submitting
a proposal.
Please use the proposal label.
As for any pull request, please make sure to go through the below
checklist.
Checklist: (mark with
X
those which apply)PR (do not bother creating the issue if all you want to do is
fixing the bug yourself).
SPDX-FileCopyrightText
lines on topof any file I have edited. Format is
SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
(Please update existing copyright lines if applicable. You can
specify year ranges with hyphen , as in
2017-2019
, and usecommas to separate gaps, as in
2018-2020, 2022
).Copyright
section of the sources of thespecification I have edited (this will show up in the text
rendered in the PDF and other output format supported). The
format is the same described in the previous item.
tricky to set up on non-*nix machines). The sequence can be
found in the contribution
guidelines. Don't
worry if you cannot run these scripts on your machine, your
patch will be automatically checked in the Actions of the pull
request.
introduced in this PR in the section Changes for next
release of the section Change Control/Document history
of the document. Create Changes for next release if it does
not exist. Notice that changes that are not modifying the
content and rendering of the specifications (both HTML and PDF)
do not need to be listed.
correctness of the result in the PDF output (please refer to the
instructions on how to build the PDFs
locally).
draftversion
is set totrue
in the YAML headerof the sources of the specifications I have modified.
in the README page of the project.