Skip to content

The Kokkos Lectures: Module 5 Q&A

Daniel Arndt edited this page Aug 14, 2020 · 2 revisions

Kokkos general

Do we need to know the scratch memory will fit in hardware shared memory on GPU or Kokkos allocate scratch memory in global device memory?

  • Level 0 is in the hardware shared memory, level 1 is in the global memory.

What if one asks more than what the hardware allows?

  • Then you get a runtime error. You can query the limits from the team policy.

Are complex data types allow as scalar types in the sims implementation?

  • Yes, complex can be a scalar type.

Are Kokkos Kernels BLAS functions blocking?

  • Generally, they're not. If their result is in a UVM view on device, the user needs to fence reading on the host.

Can device instances live/execute on different physical devices?

  • It’s possible; Cuda instances can do anything that Cuda streams can do.

Does deallocation of unmanaged views also imply fence?

  • Nothing to do with unmanaged views implies a fence.

Regarding the implicit fence on View deallocation: Does this also apply to Subviews?

  • The last View referencing the underlying memory is responsible for deallocating. Reference counting is not blocking.

Can you respawn with multiple dependencies?

  • Yes, you can respawn with multiple dependencies.

Is there a cudaEvent concept in Kokkos?

  • No, there is no cudaEvent concept in Kokkos but we are working on the equivalent of CUDA Graphs which we believe will be more useful (and more efficient) for coarse grained dependency management.

Can Kokkos futures be passed around by value?

  • Yes.

Does running multiple MPI ranks per GPU give a similar effect as using streams for concurrent kernels?

  • Yes, that is something a lot of people do.