Skip to content

Higher-level application #45

Open
Open
@pentschev

Description

@pentschev

PyNVML bindings are great to do all GPU information management from Python, but they are almost entirely an identical a copy of the C API. This can be a barrier for Python users who need to find out from the NVML API documentation what the API provides, and then what are the appropriate types that need to be passed, etc. We currently utilize PyNVML in both Distributed and Dask-CUDA, but there's also some overlap that leads to code duplication.

I feel one way to reduce code duplication and make it easier for new users, and thus make things overall better, is to provide a "High-level PyNVML library" that takes care of the basic needs for users. For example, I would imagine something like the following (but not limited to) to be available (implementation omitted for simplicity):

class Handle:
    """A handle to a GPU device.

    Parameters
    ----------
    index: int, optional
        Integer representing the CUDA device index to get a handle to.
    uuid: bytes or str, optional
        UUID of a CUDA device to get a handle to.

    Raises
    ------
    ValueError
        If neither `index` nor `uuid` are specified or if both are specified.
    """
    def __init__(
        self, index: Optional[int] = None, uuid: Optional[Union[bytes, str]] = None
    )

    @property
    def free_memory(self) -> int:
        """
        Free memory of the CUDA device.
        """

    @property
    def total_memory(self) -> int:
        """
        Total memory of the CUDA device.
        """

    @property
    def used_memory(self) -> int:
        """
        Used memory of the CUDA device.
        """

There would be more than the above to be covered, such as getting the number of available GPUs in the system, whether a GPU has a context currently created, if a handle is MIG or physical GPU, etc. Additionally, we would have simple tools that are generally useful, for example a small tool I wrote long ago to measure NVLink bandwidth and peak memory, and whatever else fits in the scope of a "High-level PyNVML library" that can make our users' lives easier.

So to begin this discussion I would like to know how people like @rjzamora and @kenhester feel about this idea. Would this be something that would fit in the scope of this project? Are there any impediments to adding such a library within the scope of this project/repository?

Also cc @quasiben for vis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions