Skip to content

Conversation

@Bensuo
Copy link
Collaborator

@Bensuo Bensuo commented Mar 21, 2025

  • Using sycl_ext_codeplay_async_memory_alloc extension
  • Spec wording for graph support of the feature
  • Usage guide examples of explicit and queue recording usage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to have a separate query or have different return value, depending on whether the graph is modifiable or finalised. When it's modifiable, this is before any optimisations are necessarily applied, so it would likely have to be the highest potential memory footprint for the graph. But then once it's finalised and optimisations for memory re-use have been applied, it can tell you the actual memory requirement on the native graph.

ianayl and others added 2 commits April 11, 2025 12:55
In `ProgramManager::removeImages`, we cleanup our KernelName2KernelID
mapping, before using that exact mapping retrieve KernelIDs in order to
clean up our KernelIDs2BinImage mapping. This PR cleans up
`m_KernelID2BinImage` mapping before cleaning up
`m_KernelName2KernelIDs` maping.

This is inteded to fix a hit raised by Coverity.
Copy link

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thanks. Left a few additional comments.

pbalcer and others added 6 commits April 14, 2025 09:44
This patch fixes up and enable memory pools for the HIP adapter, it is
based on oneapi-src/unified-runtime#1689 and on
the CUDA adapter implementation.

The initial patch had segmentation faults in the CI that we couldn't
reproduce locally. That happened as well in this patch and I couldn't
reproduce the segfaults locally either.

However I noticed that it failed in `urUSMHostAlloc`, and that entry
point was different from the CUDA adapter version, in that the HIP
adapter was using a "helper" function. It turns out that the helper
function was using a device pool instead of a host pool to do the
allocation, which seemed obviously wrong. Replacing the helper by
similar code used in the CUDA adapter fixes the crash in the CI.
…l#17850)

This ensures that functions have the right linkage.

Several functions are marked as used to prevent them from being removed
as dead code before the work item loop pass and
`PrepareSYCLNativeCPUPass` run.
- Using sycl_ext_codeplay_async_memory_alloc extension
- Spec wording for graph support of the feature
- Usage guide guidance for library authors
- Usage guide examples of explicit and queue recording usage with and without mem pools
@Bensuo Bensuo force-pushed the ben/async-alloc-graphs-spec branch from e0a153d to d204eb0 Compare April 14, 2025 12:46
@Bensuo
Copy link
Collaborator Author

Bensuo commented Apr 14, 2025

Closing in favor of upstream PR.

@Bensuo Bensuo closed this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.