forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 0
[clang-sycl-linker][clang-linker-wrapper] Move sycl linking code from linker wrapper to sycl linker #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
asudarsa
wants to merge
51
commits into
sycl
from
move_sycl_linking_code_from_linker_wrapper_to_sycl_linker
Closed
[clang-sycl-linker][clang-linker-wrapper] Move sycl linking code from linker wrapper to sycl linker #1
asudarsa
wants to merge
51
commits into
sycl
from
move_sycl_linking_code_from_linker_wrapper_to_sycl_linker
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Confirmed working in David's PR [here](https://github.com/intel/llvm/actions/runs/14227019977/job/39869445580?pr=17335), other failures there unrelated. We hit this now because CMake now finds the system OpenCL in a different path that happens to contain a space. Signed-off-by: Sarnie, Nick <[email protected]>
Theory from [here](intel#17774 (comment)) is we need to match the version from [here](https://github.com/intel/compute-runtime/blob/25.09.32961.7/manifests/manifest.yml#L50-L56). Signed-off-by: Sarnie, Nick <[email protected]>
Reverts intel#17778 Same problem
Bumps [tar-fs](https://github.com/mafintosh/tar-fs) from 2.1.1 to 2.1.2. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/mafintosh/tar-fs/commit/d97731b0e1b8a244ab859784b514cfcf5585ad3d"><code>d97731b</code></a> 2.1.2</li> <li><a href="https://github.com/mafintosh/tar-fs/commit/fd1634e869e7c5f85948e95eabdaa8451a085de5"><code>fd1634e</code></a> symlink tweak from main</li> <li>See full diff in <a href="https://github.com/mafintosh/tar-fs/compare/v2.1.1...v2.1.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/intel/llvm/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Fix Coverity reported auto-copy issues.
…elational.h into libspirv (intel#17793) clc_add_sat.h/clc_sub_sat.h become redundant after recent clc changes. generic/include/relational.h is now only used by libspirv.
We need to skip the check for private memory, otherwise OpenCL CPU device may generate false positive reports due to stack re-use in different threads. However, SPIR-V builts 'ToPrivate' doesn't work as expected on OpenCL CPU device. So we need to manually cleanup private shadow before each function exit point.
SYCL RT addImages function may be invoked multiple times for different sycl binary images, more than 1 of these sycl binary images may depend on bfloat16 device library. These bfloat16 device library images are provided by compiler and the implementation are stable now, so we only keep single copy for native and fallback version bfloat16 device library in program manager, these images will not be removed unless program manager is destroyed. --------- Signed-off-by: jinge90 <[email protected]>
…#17773) Initializes MID like in all the other constructors.
Also align some strings with regular sycl-nightly
…te (intel#17645) When updating a HostTask using whole graph update, we rely on the Sycl Scheduler to execute the HostTask and all its dependencies. The Scheduler executes this work asynchronously using a separate thread. Since exec_graph.update() was non-blocking, this was causing issues, where the user code could, for example, free resources, before the update() command (which is a dependent on the HostTask completion) could execute. This commit fixes this issue by making exec_graph.update() blocking when host_tasks are used. Co-authored-by: Ewan Crawford <[email protected]>
getSignalEvent in ur_command_list_manager returned nullptr when queue was not set. This was the case when initializing the command list manager in queue ctor called from urQueueCreateWithNativeHandle. Fix this by passing queue to the command list manager and removing the default nullptr value from the ctor (to avoid this issue in future). Also, even if queue is nullptr, return the event instead of nullptr from getSignalEvent. This event is mostly functional, it just cannot be used for profiling.
- Include node type in error message - Fix unit test triggering topology error instead of unsupported node type error Example output of new error: ``` terminate called after throwing an instance of 'sycl::_V1::exception' what(): node_type::memfill nodes are not supported for update. Only kernel, host_task, barrier and empty nodes are supported. ``` Co-authored-by: Ewan Crawford <[email protected]>
To support the SYCL-Graph extension on an OpenCL backend, we currently only require the presence of the `cl_khr_command_buffer` extension. This PR introduces an extra requirement on the [CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_COMMAND_BUFFER_SIMULTANEOUS_USE_KHR) capability being present. This is based on the [graph execution wording](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc#765-new-handler-member-functions) on the definition of `handler::ext_oneapi_graph()` that: > Only one instance of graph will execute at any time. If graph is submitted multiple times, dependencies are automatically added by the runtime to prevent concurrent executions of an identical graph. Such usage results in multiple calls by the SYCL runtime to `urEnqueueCommandBufferExp` with the same UR command-buffer and event dependencies to prevent concurrent execution. Without support for simultaneous-use the OpenCL adapter code cannot guarantee that the first command-buffer submission has finished execution before it makes following `clEnqueueCommandBufferKHR` calls with the `cl_event` decencies. If the first submission is still executing, then an error will be reported. Workarounds like adding blocking host waits to the OpenCL UR adapter are possible, but requiring simultaneous use reflects the vendor requirements as they are for the currently implementation. I've tried to document this all in the UR spec and SYCL-Graph design docs, which also includes a couple of cleanups I found along the way. Note that the new CTS test fails for Level-Zero adapter, which I've created intel#17734 to resolve. --------- Co-authored-by: Mikołaj Komar <[email protected]>
This patch simplifies the stream queue constructor by using in-class initialization when appropriate. And uses the constructors to initialize the stream vectors.
…intel#17821) This removes one dependency on OpenCL library.
Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <[email protected]>
This PR no longer generates thread_local pointers for kernels calling other kernels, which happens for example in the work_item loop. Instead of storing the state struct pointer in the thread local, it is passed directly to the called kernel function which was duplicated with an additional state struct pointer parameter if it didn't already have one. The state getter functions (native_cpu state and corresponding mux and nativecpu spirv functions) have been made __attribute((pure)) to enable more optimizations (including removal of unused calls to such builtins) before the NativeCPU passes. Pointer parameters of the native_cpu getter functions now point to constant data.
The V2 adapter now has all required functionality.
To fix compilation issue with the newest compute-runtime
by avoiding unnecessary temp shared_ptr creation
It's no longer needed after SYCLPropagateAspectsUsage pass. --------- Signed-off-by: Sidorov, Dmitry <[email protected]>
by using `const auto &` instead of `auto`.
…el#17757) Our Ubuntu 22.04 container has CUDA 12.1 installed while Ubuntu 24.04 image has CUDA 12.6.1 installed. If a PR changes CUDA adapter, we should test the change with both CUDA versions.
…tel#17834) This test requires a significant amount of host memory. It has been observed that sometimes (very rarely) the test may fail with OUT_OF_HOST_MEMORY error, especially when run in parallel with other "high-overhead" tests. Refer CMPLRLLVM-66341. This PR makes the test ignore failure if that happens. An alternative is to check for the available host memory and skip the test if it is too low. However, that approach is still susceptible to race conditions.
Co-authored-by: omarahmed1111 <[email protected]>
image_get_info.cpp previously comprised three types of tests which have been split into separate test files - testing queries of a `oneapi::experimental::image_mem` instance (image_get_info.cpp) - testing device queries for image requirements (either max image dimensions or pitch alignment) (image_reqs_get_info.cpp) - testing device queries for bindless images aspects (bindless_aspects.cpp) This PR splits up these tests which is useful for diagnosing problems on different backends: - image_get_info.cpp is currently only unsupported on the HIP backend (Known issue) - image_reqs_get_info.cpp is currently only unsupported on the L0 backend intel#17663 - bindless_aspects.cpp has been fixed on L0 backend so it now passes on all backends, by only querying `mipmap_max_anisotropy` if `aspect::ext_oneapi_mipmap_anisotropy` returns true --------- Signed-off-by: JackAKirk <[email protected]>
The driver I have at least has a number of issues, the largest of which is that it lacks an IR compiler. This updates tests so that they are skipped if required.
This patch removes unused handler_impl members.
Address issue intel#16451 , where property `use_root_sync` is not processed properly. Also revised `sycl/test-e2e/GroupAlgorithm/root_group.cpp` to not use the deprecated version of `parallel_for`. (Which was previously blocked by this issue about `use_root_sync`). Also here's some explanation for the change in `handler.hpp`: This is where the previous code doesn't handle `use_root_sync` correctly: `processLaunchProperties` will be called twice, first for the property list returned by the kernel functor's `get(properties_tag)` method, and then for `Props` that is passed in as a parameter to `parallel_for`. Therefore, if the `get(properties_tag)` method specifies `use_root_sync` and `Props` is empty or doesn't contain `use_root_sync`, what will be done is: - first, the property list returned by the kernel functor's `get(properties_tag)` method get processed. And since it contains `use_root_sync`, `setKernelIsCooperative(true)` is called; - then, the property list `Props` that is passed in as a parameter to `parallel_for` get processed. And since it doesn't contain `use_root_sync` (actually for the non-deprecated variants of `parallel_for`, `Props` should always be an empty property list), `setKernelIsCooperative(**false**)` is called And thus in the end the `MKernelIsCooperative` flag will be set to false, while it actually should be true. Revising the code like this solve the problem. Also `MKernelIsCooperative` is false by default, so we don't need to worry if `setKernelIsCooperative` is not called. --------- Signed-off-by: Hu, Peisen <[email protected]>
This contains a fix for google/googletest#4036 , which we encounter if we are outputting JSON and the test runner finds no adapters.
This PR disables flaky fill test for command buffer that was enabled in intel#17709. The issue is connected to the bug in the driver that is patched in new version, but the CI machines still have the old one, which causes it to sometimes fail (for example https://github.com/intel/llvm/actions/runs/14250564960/job/39942652796?pr=17836)
* Removes expression trees support * Applies all previous vec fixes to align with recent spec changes
Fix repo_ref value. I probably didn't notice it when I changed the others.
This tests that converting a UR handle to a native handle, then back into a UR handle preserves/recreates any handles contained within it. The event test is a special case; since urEventCreateWithNativeHandle doesn't accept a queue paramater. For this, we tollerate a different queue being created, as long as they both have the same underlying native handle.
This patch adds a step to conditionally pack pure sycl-toolchain for release.
We've removed host device support some time ago.
This commit updates OCK to apply a code generation fix that will affect upcoming NativeCPU changes, as well as a forward compatibility fix to prevent issues with the next LLVM pulldown or the one thereafter.
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
e0e1256 to
94d5235
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a separate branch that will be maintained to alleviate merge pain later.
Thanks