-
Notifications
You must be signed in to change notification settings - Fork 100
The Kokkos Lectures: Module 5 Q&A
Daniel Arndt edited this page Aug 14, 2020
·
2 revisions
Do we need to know the scratch memory will fit in hardware shared memory on GPU or Kokkos allocate scratch memory in global device memory?
- Level 0 is in the hardware shared memory, level 1 is in the global memory.
- Then you get a runtime error. You can query the limits from the team policy.
- Yes, complex can be a scalar type.
- Generally, they're not. If their result is in a UVM view on device, the user needs to fence reading on the host.
- It’s possible; Cuda instances can do anything that Cuda streams can do.
- Nothing to do with unmanaged views implies a fence.
- The last View referencing the underlying memory is responsible for deallocating. Reference counting is not blocking.
- Yes, you can respawn with multiple dependencies.
- No, there is no cudaEvent concept in Kokkos but we are working on the equivalent of CUDA Graphs which we believe will be more useful (and more efficient) for coarse grained dependency management.
- Yes.
Does running multiple MPI ranks per GPU give a similar effect as using streams for concurrent kernels?
- Yes, that is something a lot of people do.