CUDA Kernel Parameters

CUDA 11.7 [introduced](https://stackoverflow.com/questions/72161164/what-does-the-grid-constant-parameter-qualifier-do)  `__grid_constant__` so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis.  But threads reading constant memory, reads through with a cache line (see Section 3.4 in ["Dissecting Turing T4 GPU"](https://arxiv.org/pdf/1903.07486)), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.

However, the parameters of the CUDA kernel function residing within constant memory of the GPU,  was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.

Starting with CUDA 12.1 (see [here](https://developer.nvidia.com/blog/cuda-12-1-supports-large-kernel-parameters/) (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers).  This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier. 

This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.  



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Kernel Parameters #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA Kernel Parameters #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions