Skip to content

[SYCL][Graph] Add spec wording for graph-owned memory allocations #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

Bensuo
Copy link
Collaborator

@Bensuo Bensuo commented Mar 21, 2025

  • Using sycl_ext_codeplay_async_memory_alloc extension
  • Spec wording for graph support of the feature
  • Usage guide examples of explicit and queue recording usage

dependencies are correct. Using a pointer in a graph command ordered after it
has been freed via an `async_free` node results in undefined behavior.

The total amount of physical memory being used by a graph can be queried using

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to have a separate query or have different return value, depending on whether the graph is modifiable or finalised. When it's modifiable, this is before any optimisations are necessarily applied, so it would likely have to be the highest potential memory footprint for the graph. But then once it's finalised and optimisations for memory re-use have been applied, it can tell you the actual memory requirement on the native graph.

ianayl and others added 2 commits April 11, 2025 12:55
In `ProgramManager::removeImages`, we cleanup our KernelName2KernelID
mapping, before using that exact mapping retrieve KernelIDs in order to
clean up our KernelIDs2BinImage mapping. This PR cleans up
`m_KernelID2BinImage` mapping before cleaning up
`m_KernelName2KernelIDs` maping.

This is inteded to fix a hit raised by Coverity.
Copy link

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thanks. Left a few additional comments.

pbalcer and others added 6 commits April 14, 2025 09:44
This patch fixes up and enable memory pools for the HIP adapter, it is
based on oneapi-src/unified-runtime#1689 and on
the CUDA adapter implementation.

The initial patch had segmentation faults in the CI that we couldn't
reproduce locally. That happened as well in this patch and I couldn't
reproduce the segfaults locally either.

However I noticed that it failed in `urUSMHostAlloc`, and that entry
point was different from the CUDA adapter version, in that the HIP
adapter was using a "helper" function. It turns out that the helper
function was using a device pool instead of a host pool to do the
allocation, which seemed obviously wrong. Replacing the helper by
similar code used in the CUDA adapter fixes the crash in the CI.
…l#17850)

This ensures that functions have the right linkage.

Several functions are marked as used to prevent them from being removed
as dead code before the work item loop pass and
`PrepareSYCLNativeCPUPass` run.
- Using sycl_ext_codeplay_async_memory_alloc extension
- Spec wording for graph support of the feature
- Usage guide guidance for library authors
- Usage guide examples of explicit and queue recording usage with and without mem pools
@Bensuo Bensuo force-pushed the ben/async-alloc-graphs-spec branch from e0a153d to d204eb0 Compare April 14, 2025 12:46
@Bensuo
Copy link
Collaborator Author

Bensuo commented Apr 14, 2025

Closing in favor of upstream PR.

@Bensuo Bensuo closed this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.