Skip to content

CUDA Kernel Parameters #11

@Ali-Tehrani

Description

@Ali-Tehrani

CUDA 11.7 introduced __grid_constant__ so that function parameters reside inside constant memory. This makes it much easier to pass the basis-set, similar to GBasis, rather than the approach taken in cuGBasis. But threads reading constant memory, reads through with a cache line (see Section 3.4 in "Dissecting Turing T4 GPU"), still making the approach that cuGBasis has, reaching high optimal performance since it exploits spatial locality. Based on Figure 3.9, there is 100-200 Latency (clock-cycles) improvement.

However, the parameters of the CUDA kernel function residing within constant memory of the GPU, was limited to only 4096 bytes, which means it can only store 512 (double-precision) numbers.

Starting with CUDA 12.1 (see here (and Volta architecture and higher), kernel functions can now have parameters 32,764 bytes (4095 double-precision numbers). This is still less than the 64 kilobytes of constant memory, but would make coding and storing other kinds of parameters easier.

This would be beneficial for Promolecular Coefficients/Exponents to be stored as kernel parameters, and use the constant memory for atomic coordinates, making it significantly easier to add more atoms. For example, each atom has roughly 20-27 coefficients for both S-Type and P-type, making for each atom a total of 80-108 numbers per atom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions