Skip to content

Switch to use CUDA driver APIs in Device constructor #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

leofang
Copy link
Member

@leofang leofang commented Feb 21, 2025

Before this PR:

In [3]: %timeit Device()
660 ns ± 2.01 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit Device(0)
644 ns ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

With this PR:

In [3]: %timeit Device()
396 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit Device(0)
165 ns ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

(Bindings are built from the main branch.)

Copy link
Contributor

copy-pr-bot bot commented Feb 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang self-assigned this Feb 22, 2025
@leofang leofang added the blocked This task is currently blocked by other tasks label Feb 22, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module and removed blocked This task is currently blocked by other tasks labels Apr 5, 2025
@leofang leofang added this to the cuda.core beta 4 milestone Apr 5, 2025
@leofang leofang changed the title WIP: Switch to use CUDA driver APIs in Device constructor Switch to use CUDA driver APIs in Device constructor Apr 6, 2025
@leofang
Copy link
Member Author

leofang commented Apr 6, 2025

/ok to test

Copy link

github-actions bot commented Apr 6, 2025

@leofang leofang requested review from rwgk and ksimpson-work April 7, 2025 17:39
@leofang leofang marked this pull request as ready for review April 7, 2025 17:39
@leofang leofang marked this pull request as draft April 7, 2025 22:19
@leofang leofang marked this pull request as ready for review May 24, 2025 02:16
@leofang
Copy link
Member Author

leofang commented May 24, 2025

/ok to test c9fac0b

@leofang
Copy link
Member Author

leofang commented May 28, 2025

This is ready.

rwgk
rwgk previously approved these changes May 28, 2025
@leofang
Copy link
Member Author

leofang commented Jun 6, 2025

/ok to test d70ec24

err, dev = driver.cuCtxGetDevice()
if err == 0:
device_id = int(dev)
elif err == 201: # CUDA_ERROR_INVALID_CONTEXT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the enum value from driver here instead of 201?

@@ -959,7 +960,7 @@ class Device:

__slots__ = ("_id", "_mr", "_has_inited", "_properties")

def __new__(cls, device_id=None):
def __new__(cls, device_id: int = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Optional[int]?

if not (0 <= device_id < total):
raise ValueError(f"device_id must be within [0, {total}), got {device_id}")
err, dev = driver.cuCtxGetDevice()
if err == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 would be good to use driver.CUResult.CUDA_SUCCESS?

@leofang
Copy link
Member Author

leofang commented Jun 6, 2025

/ok to test d279e50

@leofang
Copy link
Member Author

leofang commented Jun 7, 2025

Blocked by #687.

@leofang leofang added the blocked This task is currently blocked by other tasks label Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This task is currently blocked by other tasks cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants