Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels #155

brandon-b-miller · 2025-03-11T19:58:46Z

Closes #66
Closes #65

WIP, current code finds nvvm/libdevice which is enough to launch kernels, nvrtc support is next. Logic vendored from nvmath-python

numba_cuda/numba/cuda/cuda_paths.py

gmarkall

A few questions on the diff - in addition, do we plan to add a CI config that installs these from wheels so that we know it will continue to work?

brandon-b-miller · 2025-03-18T14:21:54Z

A few questions on the diff - in addition, do we plan to add a CI config that installs these from wheels so that we know it will continue to work?

Yes, I'll see about adding a separate CI job for this

brandon-b-miller · 2025-03-18T16:07:37Z

ci/test_wheel_deps_wheels.sh

+
+# remove cuda-nvvm-12-5 leaving libnvvm.so from nvidia-cuda-nvcc-cu12 only 
+apt-get update
+apt remove --purge cuda-nvvm-12-5 -y


This combined with the addition of nvidia-cuda-nvcc-cu12 was the easiest way I could think of to get to the relevant test environment, but I'm by no means married to it, this would have to be dynamic wrt the minor version as well.

You can get the installed package name with something like

CUDA_NVVM_PACKAGE=`dpkg --get-selections | grep cuda-nvvm | awk '{print $1}'`

…a-cuda#155 AS-IS

…py from NVIDIA/numba-cuda#155

ZzEeKkAa · 2025-03-21T13:50:41Z

I've merged this branch with main (fbbc040) and tested on nvmath-python. I was able successfully get rid of this patch:

    # our device apis only support cuda 12+
    _utils.force_loading_nvrtc("12")
    nvrtc.NVRTC.__new__ = __nvrtc_new__

But can't get rid of

    # Patch Numba to support wheels
    _utils.patch_numba_nvvm(nvvm)

I'm getting the error:

> python ./examples/device/cublasdx_simple_gemm_fp32.py
/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py:663: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
Traceback (most recent call last):
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/./examples/device/cublasdx_simple_gemm_fp32.py", line 78, in <module>
    main()
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/./examples/device/cublasdx_simple_gemm_fp32.py", line 68, in main
    f[1, block_dim](a_d, b_d, c_d, alpha, beta, o_d)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 666, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 808, in call
    kernel = _dispatcher.Dispatcher._cuda_call(self, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 816, in _compile_for_args
    return self.compile(tuple(argtypes))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1065, in compile
    kernel = _Kernel(self.py_func, argtypes, **self.targetoptions)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 156, in __init__
    cres = compile_cuda(self.py_func, types.void, self.argtypes,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/compiler.py", line 290, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 739, in compile_extra
    return pipeline.compile_extra(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 439, in compile_extra
    return self._compile_bytecode()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 505, in _compile_bytecode
    return self._compile_core()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 481, in _compile_core
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 473, in _compile_core
    pm.run(self.state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 363, in run
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
    mangled = func(compiler_state)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 112, in run_pass
    typemap, return_type, calltypes, errs = type_inference_stage(
                                            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 93, in type_inference_stage
    errs = infer.propagate(raise_errors=raise_errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 1066, in propagate
    errors = self.constraints.propagate(self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 160, in propagate
    constraint(typeinfer)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 566, in __call__
    self.resolve(typeinfer, typevars, fnty)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 589, in resolve
    sig = typeinfer.resolve_call(fnty, pos_args, kw_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typeinfer.py", line 1560, in resolve_call
    return self.context.resolve_function_type(fnty, pos_args, kw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typing/context.py", line 195, in resolve_function_type
    res = self._resolve_user_function_type(func, args, kws)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typing/context.py", line 247, in _resolve_user_function_type
    return func.get_call_type(self, args, kws)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/types/functions.py", line 538, in get_call_type
    self.dispatcher.get_call_template(args, kws)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 979, in get_call_template
    self.compile_device(tuple(args))
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/dispatcher.py", line 1016, in compile_device
    cres = compile_cuda(self.py_func, return_type, args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/compiler.py", line 290, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 739, in compile_extra
    return pipeline.compile_extra(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 439, in compile_extra
    return self._compile_bytecode()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 505, in _compile_bytecode
    return self._compile_core()
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 481, in _compile_core
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler.py", line 473, in _compile_core
    pm.run(self.state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 363, in run
    raise e
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/compiler_machinery.py", line 272, in check
    mangled = func(compiler_state)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/typed_passes.py", line 466, in run_pass
    lower = self.lowering_class(targetctx, library, fndesc, interp,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/lowering.py", line 40, in __init__
    self.module = self.library.create_ir_module(self.fndesc.unique_name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/nvmath-python/venv/lib/python3.12/site-packages/numba/core/codegen.py", line 574, in create_ir_module
    ir_module = self._codegen._create_empty_module(name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/codegen.py", line 399, in _create_empty_module
    ir_module.data_layout = nvvm.NVVM().data_layout
                            ^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
    inst.driver = open_cudalib('nvvm')
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/libs.py", line 83, in open_cudalib
    path = get_cudalib(lib)
           ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cudadrv/libs.py", line 54, in get_cudalib
    return get_cuda_paths()['nvvm'].info or _dllnamepattern % 'nvvm'
           ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 290, in get_cuda_paths
    'nvvm': _get_nvvm_path(),
            ^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 263, in _get_nvvm_path
    by, path = _get_nvvm_path_decision()
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.yhavrylko_ent/Projects/nvidia/clean_test/numba-cuda/numba_cuda/numba/cuda/cuda_paths.py", line 60, in _get_nvvm_path_decision
    if os.path.exists(nvvm_ctk_dir):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 19, in exists
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Context:
pynvjitlink is on and lto set to True

brandon-b-miller · 2025-03-21T13:57:18Z

Hi @ZzEeKkAa , there's a couple pieces of this that are still WIP, I think you'll probably run into bugs right now. I'm working this PR over the next few days so hopefully some more updates soon.

brandon-b-miller · 2025-04-17T14:47:56Z

The logs from the wheel-deps test (https://github.com/NVIDIA/numba-cuda/actions/runs/14470658131/job/40584109630?pr=155) show cudart and cudadevrt being found from the system instead of the wheels - I presume this is not expected?
Finding cudart from System
	Located at /usr/local/cuda/lib64/libcudart.so.12.8.90
	Trying to open library...	ok
Finding cudadevrt from System
	Located at /usr/local/cuda/lib64/libcudadevrt.a
	Checking library...	ok

c14644a

rwgk · 2025-04-17T16:18:15Z

@brandon-b-miller Suggested change to PR title:

-Locate nvvm, libdevice and nvrtc from nvidia-cuda-nvcc-cu12 wheels
+Locate nvvm, libdevice and nvrtc from nvidia-*-cu12 wheels

Because:

nvvm, libdevice → nvidia-cuda-nvcc-cu12
nvrtc → nvidia-cuda-nvrtc-cu12

numba_cuda/numba/cuda/cuda_paths.py

rwgk · 2025-04-17T16:28:58Z

numba_cuda/numba/cuda/cuda_paths.py

+    try:
+        return SEARCH_PRIORITY.index(label)
+    except ValueError:
+        return float("inf")


This can easily mask bugs that are troublesome to track down (e.g. surprising behavior if there is a typo in the label, or a new label is introduced elsewhere without updating SEARCH_PRIORITY). — I realize this PR is meant to be a stop-gap. Just pointing out. Generally, I'd try to not use such a brittle approach. If there is an easy way to avoid this, that'd be better.

E.g., what happens if you simply remove the try-except? Do all tests pass?

This is a great observation, what do you think of the changes in 8cc37d7 ?

gmarkall · 2025-04-17T19:24:55Z

The CI runs are still finding the runtime from the system installation: https://github.com/NVIDIA/numba-cuda/actions/runs/14518427298/job/40733627061?pr=155

Finding cudart from System
	Located at /usr/local/cuda/lib64/libcudart.so.12.8.90
	Trying to open library...	ok
Finding cudadevrt from System
	Located at /usr/local/cuda/lib64/libcudadevrt.a
	Checking library...	ok

The nvidia-cuda-runtime-cu12 wheel also needs installing in the test script, I think.

gmarkall · 2025-04-17T19:33:51Z

numba_cuda/numba/cuda/cuda_paths.py

The following changes lead to success on Windows:

diff --git a/numba_cuda/numba/cuda/cuda_paths.py b/numba_cuda/numba/cuda/cuda_paths.py index 9b38a86..bc3822a 100644 --- a/numba_cuda/numba/cuda/cuda_paths.py +++ b/numba_cuda/numba/cuda/cuda_paths.py @@ -224,8 +224,9 @@ def _cuda_home_static_cudalib_path(): def _get_cudalib_wheel(): """Get the cudalib path from the NVCC wheel.""" site_paths = [site.getusersitepackages()] + site.getsitepackages() + libdir = IS_LINUX and "lib" or "bin" for sp in filter(None, site_paths): - cudalib_path = Path(sp, "nvidia", "cuda_runtime", "lib") + cudalib_path = Path(sp, "nvidia", "cuda_runtime", libdir) if cudalib_path.exists(): return str(cudalib_path) return None @@ -373,8 +374,20 @@ def get_cuda_home(*subdirs): def _get_nvvm_path(): by, path = _get_nvvm_path_decision() + if by == "NVIDIA NVCC Wheel": - path = os.path.join(path, "libnvvm.so") + platform_map = { + "linux": "libnvvm.so", + "win32": "nvvm64_40_0.dll", + } + + for plat, dso_name in platform_map.items(): + if sys.platform.startswith(plat): + break + else: + raise NotImplementedError("Unsupported platform") + + path = os.path.join(path, dso_name) else: candidates = find_lib("nvvm", path) path = max(candidates) if candidates else None

Library test output:

(test-cuda-wheels) PS C:\Users\gmarkall\numbadev\numba-cuda> python -c "from numba import cuda; cuda.cudadrv.libs.test()" Finding driver from candidates: nvcuda.dll \windows\system32\nvcuda.dll Using loader <class 'ctypes.WinDLL'> Trying to load driver... ok Loaded from nvcuda.dll Finding nvvm from NVIDIA NVCC Wheel Located at D:\miniforge\envs\test-cuda-wheels\Lib\site-packages\nvidia\cuda_nvcc\nvvm\bin\nvvm64_40_0.dll Trying to open library... ok Finding nvrtc from NVIDIA NVCC Wheel Located at D:\miniforge\envs\test-cuda-wheels\Lib\site-packages\nvidia\cuda_nvrtc\bin\nvrtc64_120_0.dll Trying to open library... ok Finding cudart from NVIDIA NVCC Wheel Located at D:\miniforge\envs\test-cuda-wheels\Lib\site-packages\nvidia\cuda_runtime\bin\cudart64_12.dll Trying to open library... ok Finding cudadevrt from <unknown> Located at cudadevrt.lib Checking library... ERROR: failed to find cudadevrt: cudadevrt.lib not found Finding libdevice from NVIDIA NVCC Wheel Located at D:\miniforge\envs\test-cuda-wheels\Lib\site-packages\nvidia\cuda_nvcc\nvvm\libdevice\libdevice.10.bc Checking library... ok Include directory configuration variable: CUDA_INCLUDE_PATH=cuda_include_not_found Finding include directory from CUDA_INCLUDE_PATH Config Entry Located at cuda_include_not_found Checking include directory... ERROR: failed to find cuda include directory:

We will just have to ignore that the includes and cudadevrt don't seem to be available in wheels, though.

patch added in d5b68a9

brandon-b-miller · 2025-04-17T20:25:38Z

The CI runs are still finding the runtime from the system installation: https://github.com/NVIDIA/numba-cuda/actions/runs/14518427298/job/40733627061?pr=155
Finding cudart from System
	Located at /usr/local/cuda/lib64/libcudart.so.12.8.90
	Trying to open library...	ok
Finding cudadevrt from System
	Located at /usr/local/cuda/lib64/libcudadevrt.a
	Checking library...	ok
The nvidia-cuda-runtime-cu12 wheel also needs installing in the test script, I think.

This has been updated, now I see

Finding cudart from NVIDIA NVCC Wheel
	Located at /pyenv/versions/3.13.3/lib/python3.13/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12

rwgk

Just a couple nits.

rwgk · 2025-04-17T20:40:12Z

numba_cuda/numba/cuda/cuda_paths.py

+                    cu_ver = "11.2" if not IS_WIN32 else "112"
+                elif major == 12:
+                    cu_ver = "12" if not IS_WIN32 else "120"
+                else:
+                    raise NotImplementedError(f"CUDA {major} is not supported")
+
+                return os.path.join(
+                    lib_dir,
+                    f"libnvrtc.so.{cu_ver}"
+                    if not IS_WIN32


To make this neat, I'd flip these around, e.g. cu_ver = "112" if IS_WIN32 else "11.2"

I think the logic here is not for cuveralone, but for the OS-dependent DSO name.

rwgk · 2025-04-17T20:44:51Z

numba_cuda/numba/cuda/cuda_paths.py

+def _get_cudalib_wheel():
+    """Get the cudalib path from the NVCC wheel."""
+    site_paths = [site.getusersitepackages()] + site.getsitepackages()
+    libdir = not IS_WIN32 and "lib" or "bin"


This one too (libdir = "bin" if IS_WIN32 else "lib")

(What you have right now is a real brain teaser!)

gmarkall

As of the latest commit, dfe25c8, this is still working for me for both Conda packages and pip wheels on Windows and Linux.

I think the only caveat is that cudadevrt.lib and the headers are not pip-installable, but that's not an issue we can solve inside this PR and that shouldn't block it.

leofang · 2025-04-17T23:12:36Z

numba_cuda/numba/cuda/cuda_paths.py

+    "Conda environment",
+    "Conda environment (NVIDIA package)",


Leaving a note here, no action needed for this PR.

I understand this is just a refactoring of existing code, but still having this distinction (conda-forge/nvidia) is a bit nerve wrecking, especially after CUDA 12.0 where both channels started unifying the layouts (different from cudatoolkit from 11.x and before) and eventually became interchangeable around 12.5. cc @jakirkham for comments.

This distinction is here because there's a distinction in the packages though - we can't unify it here because it's not unified in the wider landscape of CUDA toolkit conda packaging.

I assume you're referring to the layout differences between CUDA 11/12?

kkraus14 · 2025-04-18T20:46:49Z

@brandon-b-miller looks like there's a couple of comments from @rwgk and then this is ready to merge?

ZzEeKkAa · 2025-04-18T22:57:50Z

I've just run tests on nvmath-python by using this branch and removing

    # Patch Numba to support wheels
    _utils.patch_numba_nvvm(nvvm)

    # our device apis only support cuda 12+
    _utils.force_loading_nvrtc("12")
    nvrtc.NVRTC.__new__ = __nvrtc_new__

And everything works good so far, thank you!

- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (NVIDIA#155) - reinstate test (NVIDIA#226) - Restore PR NVIDIA#185 (Stop Certain Driver API Discovery for "v2") (NVIDIA#223) - Report NVRTC builtin operation failures to the user (NVIDIA#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (NVIDIA#145) - Test CUDA 12.8. (NVIDIA#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (NVIDIA#189) - Migrate code style to ruff (NVIDIA#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (NVIDIA#188) - Update workflows to always use proxy cache. (NVIDIA#191)

- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (#155) - reinstate test (#226) - Restore PR #185 (Stop Certain Driver API Discovery for "v2") (#223) - Report NVRTC builtin operation failures to the user (#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (#145) - Test CUDA 12.8. (#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (#189) - Migrate code style to ruff (#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (#188) - Update workflows to always use proxy cache. (#191)

@pytest

* First version of `cuda.bindings.path_finder` (#447) * Unmodified copies of: * https://github.com/NVIDIA/numba-cuda/blob/bf487d78a40eea87f009d636882a5000a7524c95/numba_cuda/numba/cuda/cuda_paths.py * https://github.com/numba/numba/blob/f0d24824fcd6a454827e3c108882395d00befc04/numba/misc/findlib.py * Add Forked from URLs. * Strip down cuda_paths.py to minimum required for `_get_nvvm_path()` Tested interactively with: ``` import cuda_paths nvvm_path = cuda_paths._get_nvvm_path() print(f"{nvvm_path=}") ``` * ruff auto-fixes (NO manual changes) * Make `get_nvvm_path()` a pubic API (i.e. remove leading underscore). * Fetch numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from NVIDIA/numba-cuda#155 AS-IS * ruff format NO MANUAL CHANGES * Minimal changes to adapt numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from NVIDIA/numba-cuda#155 * Rename ecosystem/cuda_paths.py -> path_finder.py * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvvm_linux.pyx * Plug cuda.bindings.path_finder into cuda/bindings/_internal/nvjitlink_linux.pyx * Fix `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:312: in get_cuda_paths "nvvm": _get_nvvm_path(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:285: in _get_nvvm_path by, path = _get_nvvm_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:96: in _get_nvvm_path_decision if os.path.exists(nvvm_ctk_dir): <frozen genericpath>:19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Fix another `os.path.exists(None)` issue: ``` ______________________ ERROR collecting test_nvjitlink.py ______________________ tests/test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests/test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda/bindings/_internal/nvjitlink.pyx:257: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:260: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda/bindings/_internal/nvjitlink.pyx:208: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda/bindings/_internal/nvjitlink.pyx:102: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda/bindings/_internal/nvjitlink.pyx:59: in cuda.bindings._internal.nvjitlink.load_library ??? /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:313: in get_cuda_paths "libdevice": _get_libdevice_paths(), /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:126: in _get_libdevice_paths by, libdir = _get_libdevice_path_decision() /opt/hostedtoolcache/Python/3.13.2/x64/lib/python3.13/site-packages/cuda/bindings/path_finder.py:73: in _get_libdevice_path_decision if os.path.exists(libdevice_ctk_dir): <frozen genericpath>:19: in exists ??? E TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType ``` * Change "/lib64/" → "/lib/" in nvjitlink_linux.pyx * nvjitlink_linux.pyx load_library() enhancements, mainly to avoid os.path.join(None, "libnvJitLink.so") * Add missing f-string f * Add back get_nvjitlink_dso_version_suffix() call. * pytest -ra -s -v * Rewrite nvjitlink_linux.pyx load_library() to produce detailed error messages. * Attach listdir output to "Unable to load" exception message. * Guard os.listdir() call with os.path.isdir() * Fix logic error in nvjitlink_linux.pyx load_library() * Move path_finder.py to _path_finder_utils/cuda_paths.py, import only public functions from new path_finder.py * Add find_nvidia_dynamic_library() and use from nvjitlink_linux.pyx, nvvm_linux.pyx * Fix oversight in _find_using_lib_dir() * Also look for versioned library in _find_using_nvidia_lib_dirs() * glob.glob() Python 3.9 compatibility * Reduce build-and-test.yml to Windows-only, Python 3.12 only. * Comment out `if: ${{ github.repository_owner == nvidia }}` * Revert "Comment out `if: ${{ github.repository_owner == nvidia }}`" This reverts commit b0db24f. * Add back `linux-64` `host-platform` * Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library() * Revert "Rewrite load_library() in nvjitlink_windows.pyx to use path_finder.find_nvidia_dynamic_library()" This reverts commit 1bb7151. * Add _inspect_environment() in find_nvidia_dynamic_library.py, call from nvjitlink_windows.pyx, nvvm_windows.pyx * Add & use _find_dll_using_nvidia_bin_dirs(), _find_dll_using_cudalib_dir() * Fix silly oversight: forgot to undo experimental change. * Also reduce test test-linux matrix. * Reimplement load_library() functions in nvjitlink_windows.pyx, nvvm_windows.pyx to actively use path_finder.find_nvidia_dynamic_library() * Factor out load_nvidia_dynamic_library() from _internal/nvjitlink_linux.pyx, nvvm_linux.pyx * Generalize load_nvidia_dynamic_library.py to also work under Windows. * Add `void*` return type to load_library() implementations in _internal/nvjitlink_windows.pyx, nvvm_windows.pyx * Resolve cython error: object handle vs `void*` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... err = (<int (*)(int*) nogil>__cuDriverGetVersion)(&driver_ver) if err != 0: raise RuntimeError('something went wrong') # Load library handle = load_library(driver_ver) ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:72:29: Cannot convert 'void *' to Python object ``` * Resolve another cython error: `void*` handle vs `intptr_t` handle ``` Error compiling Cython file: ------------------------------------------------------------ ... handle = load_library(driver_ver) # Load function global __nvJitLinkCreate try: __nvJitLinkCreate = <void*><intptr_t>win32api.GetProcAddress(handle, 'nvJitLinkCreate') ^ ------------------------------------------------------------ cuda\bindings\_internal\nvjitlink.pyx:78:73: Cannot convert 'void *' to Python object ``` * Resolve signed/unsigned runtime error. Use uintptr_t consistently. https://github.com/NVIDIA/cuda-python/actions/runs/14224673173/job/39861750852?pr=447#logs ``` =================================== ERRORS ==================================== _____________________ ERROR collecting test_nvjitlink.py ______________________ tests\test_nvjitlink.py:62: in <module> not check_nvjitlink_usable(), reason="nvJitLink not usable, maybe not installed or too old (<12.3)" tests\test_nvjitlink.py:58: in check_nvjitlink_usable return inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") != 0 cuda\\bindings\\_internal\\nvjitlink.pyx:221: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:224: in cuda.bindings._internal.nvjitlink._inspect_function_pointer ??? cuda\\bindings\\_internal\\nvjitlink.pyx:172: in cuda.bindings._internal.nvjitlink._inspect_function_pointers ??? cuda\\bindings\\_internal\\nvjitlink.pyx:73: in cuda.bindings._internal.nvjitlink._check_or_init_nvjitlink ??? cuda\\bindings\\_internal\\nvjitlink.pyx:46: in cuda.bindings._internal.nvjitlink.load_library ??? E OverflowError: can't convert negative value to size_t ``` * Change <void*><uintptr_t>win32api.GetProcAddress` back to `intptr_t`. Changing load_nvidia_dynamic_library() to also use to-`intptr_t` conversion, for compatibility with win32api.GetProcAddress. Document that CDLL behaves differently (it uses to-`uintptr_t`). * Use win32api.LoadLibrary() instead of ctypes.windll.kernel32.LoadLibraryW(), to be more similar to original (and working) cython code. Hoping to resolve this kind of error: ``` _ ERROR at setup of test_c_or_v_program_fail_bad_option[txt-compile_program] __ request = <SubRequest 'minimal_nvvmir' for <Function test_c_or_v_program_fail_bad_option[txt-compile_program]>> @pytest.fixture(params=MINIMAL_NVVMIR_FIXTURE_PARAMS) def minimal_nvvmir(request): for pass_counter in range(2): nvvmir = MINIMAL_NVVMIR_CACHE.get(request.param, -1) if nvvmir != -1: if nvvmir is None: pytest.skip(f"UNAVAILABLE: {request.param}") return nvvmir if pass_counter: raise AssertionError("This code path is meant to be unreachable.") # Build cache entries, then try again (above). > major, minor, debug_major, debug_minor = nvvm.ir_version() tests\test_nvvm.py:148: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cuda\bindings\nvvm.pyx:95: in cuda.bindings.nvvm.ir_version cpdef tuple ir_version(): cuda\bindings\nvvm.pyx:113: in cuda.bindings.nvvm.ir_version status = nvvmIRVersion(&major_ir, &minor_ir, &major_dbg, &minor_dbg) cuda\bindings\cynvvm.pyx:19: in cuda.bindings.cynvvm.nvvmIRVersion return _nvvm._nvvmIRVersion(majorIR, minorIR, majorDbg, minorDbg) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E cuda.bindings._internal.utils.FunctionNotFoundError: function nvvmIRVersion is not found ``` * Remove debug print statements. * Remove some cruft. * Trivial renaming of variables. No functional changes. * Revert debug changes under .github/workflows * Rename _path_finder_utils → _path_finder * Remove LD_LIBRARY_PATH in fetch_ctk/action.yml * Linux: First try using the platform-specific dynamic loader search mechanisms * Add _windows_load_with_dll_basename() * Revert "Revert debug changes under .github/workflows" This reverts commit cc6113c. * Add debug prints in load_nvidia_dynamic_library() * Report dlopen error for libnvrtc.so.12 * print("\nLOOOK dlfcn.dlopen('libnvrtc.so.12', dlfcn.RTLD_NOW)", flush=True) * Revert "Remove LD_LIBRARY_PATH in fetch_ctk/action.yml" This reverts commit 1b1139c. * Only remove ${CUDA_PATH}/nvvm/lib64 from LD_LIBRARY_PATH * Use path_finder.load_nvidia_dynamic_library("nvrtc") from cuda/bindings/_bindings/cynvrtc.pyx.in * Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels. * Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml * Remove CUDA_PATH\nvvm\bin in .github/workflows/test-wheel-windows.yml * Revert "Remove LD_LIBRARY_PATH entirely from .github/actions/fetch_ctk/action.yml" This reverts commit bff8cf0. * Revert "Somewhat ad hoc heuristics for nvidia_cuda_nvrtc wheels." This reverts commit 43abec8. * Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main * Remove debug print from load_nvidia_dynamic_library.py * Reapply "Revert debug changes under .github/workflows" This reverts commit aaa6aff. * Make `path_finder` work for `"nvrtc"` (#553) * Revert "Restore cuda/bindings/_bindings/cynvrtc.pyx.in as-is on main" This reverts commit ba093f5. * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f83. * Also load nvrtc from cuda_bindings/tests/path_finder.py * Add heuristics for nvidia_cuda_nvrtc Windows wheels. Also fix a couple bugs discovered by ChatGPT: * `glob.glob()` in this code return absolute paths. * stray `error_messages = []` * Add debug prints, mostly for `os.add_dll_directory(bin_dir)` * Fix unfortunate silly oversight (import os missing under Windows) * Use `win32api.LoadLibraryEx()` with suitable `flags`; also update `os.environ["PATH"]` * Hard-wire WinBase.h constants (they are not exposed by win32con) * Remove debug prints * Reapply "Reapply "Revert debug changes under .github/workflows"" This reverts commit b002ff6. * Add `path_finder.SUPPORTED_LIBNAMES` (#558) * Revert "Reapply "Revert debug changes under .github/workflows"" This reverts commit 8f69f83. * Add names of all CTK 12.8.1 x86_64-linux libraries (.so) as `path_finder.SUPPORTED_LIBNAMES` https://chatgpt.com/share/67f98d0b-148c-8008-9951-9995cf5d860c * Add `SUPPORTED_WINDOWS_DLLS` * Add copyright notice * Move SUPPORTED_LIBNAMES, SUPPORTED_WINDOWS_DLLS to _path_finder/supported_libs.py * Use SUPPORTED_WINDOWS_DLLS in _windows_load_with_dll_basename() * Change "Set up mini CTK" to use `method: local`, remove `sub-packages` line. * Use Jimver/[email protected] also under Linux, `method: local`, no `sub-packages`. * Add more `nvidia-*-cu12` wheels to get as many of the supported shared libraries as possible. * Revert "Use Jimver/[email protected] also under Linux, `method: local`, no `sub-packages`." This reverts commit d499806. Problem observed: ``` /usr/bin/docker exec 1b42cd4ea3149ac3f2448eae830190ee62289b7304a73f8001e90cead5005102 sh -c "cat /etc/*release | grep ^ID" Warning: Failed to restore: Cache service responded with 422 /usr/bin/tar --posix -cf cache.tgz --exclude cache.tgz -P -C /__w/cuda-python/cuda-python --files-from manifest.txt -z Failed to save: Unable to reserve cache with key cuda_installer-linux-5.15.0-135-generic-x64-12.8.0, another job may be creating this cache. More details: This legacy service is shutting down, effective April 15, 2025. Migrate to the new service ASAP. For more information: https://gh.io/gha-cache-sunset Warning: Error during installation: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. Error: Error: Unable to locate executable file: sudo. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable. ``` * Change test_path_finder::test_find_and_load() to skip cufile on Windows, and report exceptions as failures, except for cudart * Add nvidia-cuda-runtime-cu12 to pyproject.toml (for libname cudart) * test_path_finder.py: before loading cusolver, load nvJitLink, cusparse, cublas (experiment to see if that resolves the only Windows failure) Test (win-64, Python 3.12, CUDA 12.8.0, Runner default, CTK wheels) / test ``` ================================== FAILURES =================================== ________________________ test_find_and_load[cusolver] _________________________ libname = 'cusolver' @pytest.mark.parametrize("libname", path_finder.SUPPORTED_LIBNAMES) def test_find_and_load(libname): if sys.platform == "win32" and libname == "cufile": pytest.skip(f'test_find_and_load("{libname}") not supported on this platform') print(f'\ntest_find_and_load("{libname}")') failures = [] for algo, func in ( ("find", path_finder.find_nvidia_dynamic_library), ("load", path_finder.load_nvidia_dynamic_library), ): try: out = func(libname) except Exception as e: out = f"EXCEPTION: {type(e)} {str(e)}" failures.append(algo) print(out) print() > assert not failures E AssertionError: assert not ['load'] tests\test_path_finder.py:29: AssertionError ``` * test_path_finder.py: load *only* nvJitLink before loading cusolver * Run each test_find_or_load_nvidia_dynamic_library() subtest in a subprocess * Add cublasLt to supported_libs.py and load deps for cusolver, cusolverMg, cusparse in test_path_finder.py. Also restrict test_path_finder.py to test load only for now. * Add supported_libs.DIRECT_DEPENDENCIES * Remove cufile_rdma from supported libs (comment out). https://chatgpt.com/share/68033a33-385c-8008-a293-4c8cc3ea23ae * Split out `PARTIALLY_SUPPORTED_LIBNAMES`. Fix up test code. * Reduce public API to only load_nvidia_dynamic_library, SUPPORTED_LIBNAMES * Set CUDA_BINDINGS_PATH_FINDER_TEST_ALL_LIBNAMES=1 to match expected availability of nvidia shared libraries. * Refactor as `class _find_nvidia_dynamic_library` * Strict wheel, conda, system rule: try using the platform-specific dynamic loader search mechanisms only last * Introduce _load_and_report_path_linux(), add supported_libs.EXPECTED_LIB_SYMBOLS * Plug in ctypes.windll.kernel32.GetModuleFileNameW() * Keep track of nvrtc-related GitHub comment * Factor out `_find_dll_under_dir(dirpath, file_wild)` and reuse from `_find_dll_using_nvidia_bin_dirs()`, `_find_dll_using_cudalib_dir()` (to fix loading nvrtc64_120_0.dll from local CTK) * Minimal "is already loaded" code. * Add THIS FILE NEEDS TO BE REVIEWED/UPDATED FOR EACH CTK RELEASE comment in _path_finder/supported_libs.py * Add SUPPORTED_LINUX_SONAMES in _path_finder/supported_libs.py * Update SUPPORTED_WINDOWS_DLLS in _path_finder/supported_libs.py based on DLLs found in cuda_*win*.exe files. * Remove `os.add_dll_directory()` and `os.environ["PATH"]` manipulations from find_nvidia_dynamic_library.py. Add `supported_libs.LIBNAMES_REQUIRING_OS_ADD_DLL_DIRECTORY` and use from `load_nvidia_dynamic_library()`. * Move nvrtc-specific code from find_nvidia_dynamic_library.py to `supported_libs.is_suppressed_dll_file()` * Introduce dataclass LoadedDL as return type for load_nvidia_dynamic_library() * Factor out _abs_path_for_dynamic_library_* and use on handle obtained through "is already loaded" checks * Factor out _load_nvidia_dynamic_library_no_cache() and use for exercising LoadedDL.was_already_loaded_from_elsewhere * _check_nvjitlink_usable() in test_path_finder.py * Undo changes in .github/workflows/ and cuda_bindings/pyproject.toml * Move cuda_bindings/tests/path_finder.py -> toolshed/run_cuda_bindings_path_finder.py * Add bandit suppressions in test_path_finder.py * Add pytest info_summary_append fixture and use from test_path_finder.py to report the absolute paths of the loaded libraries. * Fix tiny accident: a line in pyproject.toml got lost somehow. * Undo changes under .github (LD_LIBRARY_PATH, PATH manipulations for nvvm). * 2025-05-01 version of `cuda.bindings.path_finder` (#578) * Undo changes to the nvJitLink, nvrtc, nvvm bindings * Undo changes under .github, specific to nvvm, manipulating LD_LIBRARY_PATH or PATH * PARTIALLY_SUPPORTED_LIBNAMES_LINUX, PARTIALLY_SUPPORTED_LIBNAMES_WINDOWS * Update EXPECTED_LIB_SYMBOLS for nvJitLink to cleanly support CTK versions 12.0, 12.1, 12.2 * Save result of factoring out load_dl_common.py, load_dl_linux.py, load_dl_windows.py with the help of Cursor. * Fix an auto-generated docstring * first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes * Revert "first round of Cursor refactoring (about 4 iterations until all tests passed), followed by ruff auto-fixes" This reverts commit 001a6a2. There were many GitHub Actions jobs that failed (all tests with 12.x): https://github.com/NVIDIA/cuda-python/actions/runs/14677553387 This is not worth spending time debugging. Especially because * Cursor has been unresponsive for at least half an hour: We're having trouble connecting to the model provider. This might be temporary - please try again in a moment. * The refactored code does not seem easier to read. * A couple trivial tweaks * Prefix the public API (just two items) with underscores for now. * Add SPDX-License-Identifier to all files under toolshed/ that don't have it already * Add SPDX-License-Identifier under cuda_bindings/tests/ * Respond to "Do these need to be run as subprocesses?" review question (#578 (comment)) * Respond to "dead code?" review questions (e.g. #578 (comment)) * Respond to "Do we need to implement a cache separately ..." review question (#578 (comment)) * Remove cuDriverGetVersion() function for now. * Move add_dll_directory() from load_dl_common.py to load_dl_windows.py (response to review question #578 (comment)) * Add SPDX-License-Identifier and # Forked from: URL in cuda_paths.py * Add Add SPDX-License-Identifier and Original LICENSE in findlib.py * Very first draft of README.md * Update README.md, mostly as revised by perplexity, with various manual edits. * Refork cuda_paths.py AS-IS: https://github.com/NVIDIA/numba-cuda/blob/8c9c9d0cb901c06774a9abea6d12b6a4b0287e5e/numba_cuda/numba/cuda/cuda_paths.py * ruff format cuda_paths.py (NO manual changes) * Add back _get_numba_CUDA_INCLUDE_PATH from 2279bda (i.e. cuda_paths.py as it was right before re-forking) * Remove cuda_paths.py dependency on numba.cuda.cudadrv.runtime * Add Forked from URLs, two SPDX-License-Identifier, Original Numba LICENSE * Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage * Restore cuda_path.py AS-IT-WAS at commit 2279bda * Revert "Restore cuda_path.py AS-IT-WAS at commit 2279bda" This reverts commit 1b88ec2. * Force compute-sanitizer off unconditionally * Revert "Force compute-sanitizer off unconditionally" This reverts commit 2bc7ef6. * Add timeout=10 seconds to test_path_finder.py subprocess.run() invocations. * Increase test_path_finder.py subprocess.run() timeout to 30 seconds: Under Windows, loading cublas or cusolver may exceed the 10 second timeout: #578 (comment) * Revert "Temporarily restore debug changes under .github/workflows, for expanded path_finder test coverage" This reverts commit 47ad79f. * Force compute-sanitizer off unconditionally * Add: Note that the search is done on a per-library basis. * Add Note for CUDA_HOME / CUDA_PATH * Add 0. **Check if a library was loaded into the process already by some other means.** * _find_dll_using_nvidia_bin_dirs(): reuse lib_searched_for in place of file_wild * Systematically replace all relative imports with absolute imports. * handle: int → ctypes.CDLL fix * Make load_dl_windows.py abs_path_for_dynamic_library() implementation maximally robust. * Change argument name → libname for self-consistency * Systematically replace previously overlooked relative imports with absolute imports. * Simplify code (also for self-consistency) * Expand the 3. **System Installations** section with information produced by perplexity * Pull out `**Environment variables**` into an added section, after manual inspection of cuda_paths.py. Minor additional edits. * Revert "Force compute-sanitizer off unconditionally" This reverts commit aeaf4f0. * Move _path_finder/sys_path_find_sub_dirs.py → find_sub_dirs.py, use find_sub_dirs_all_sitepackages() from find_nvidia_dynamic_library.py * WIP (search priority updated in README.md but not in code) * Revert "WIP (search priority updated in README.md but not in code)" This reverts commit bf9734c. * WIP (search priority updated in README.md but not in code) * Completely replace cuda_paths.py to achieve the desired Search Priority (see updated README.md). * Define `IS_WINDOWS = sys.platform == "win32"` in supported_libs.py * Use os.path.samefile() to resolve issues with doubled backslashes. * `load_in_subprocess(): Pass current environment * Add run_python_code_safely.py as generated by perplexity, plus ruff format, bandit nosec * Replace subprocess.run with run_python_code_safely * Factor out `class Worker` to fix pickle issue. * ChatGPT revisions based on Deep research: https://chatgpt.com/share/681914ce-f274-8008-9e9f-4538716b4ed7 * Fix race condition in result queue handling by using timeout-based get() The previous implementation checked result_queue.empty() before calling get(), which introduces a classic race condition: the queue may become non-empty immediately after the check, resulting in missed results or misleading errors. This patch replaces the empty() check with result_queue.get(timeout=1.0), allowing the parent process to robustly wait for results with a bounded delay. Also switches from ctx.SimpleQueue() to ctx.Queue() for compatibility with timeout-based get(), which SimpleQueue does not support on Python ≤3.12. Note: The race condition was discovered by Gemini 2.5 * Resolve SIM108 * Change to "nppc" as ANCHOR_LIBNAME * Implement CUDA_PYTHON_CUDA_HOME_PRIORITY first, last, with default first * Remove retry_with_anchor_abs_path() and make retry_with_cuda_home_priority_last() the default. * Update README.md to reflect new search priority * SUPPORTED_LINUX_SONAMES does not need updates for CTK 12.9.0 * The only addition to SUPPORTED_WINDOWS_DLLS for CTK 12.9.0 is nvvm70.dll * Make OSError in load_dl_windows.py abs_path_for_dynamic_library() more informative. * run_cuda_bindings_path_finder.py: optionally use args as libnames (to aid debugging) * Bug fix in load_dl_windows.py: ctypes.windll.kernel32.LoadLibraryW() returns an incompatible `handle`. Use win32api.LoadLibraryEx() instead to ensure self-consistency. * Remove _find_nvidia_dynamic_library.retry_with_anchor_abs_path() method. Move run_python_code_safely.py to test/ directory. * Add missing SPDX-License-Identifier

With the merge of NVIDIA/numba-cuda#155 we need to depend on these two wheels if we want `numba-cuda` to be able to find the runtime libraries it needs in the final cuDF environment. Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #18686

brandon-b-miller added 8 commits March 10, 2025 12:47

initial

b8238f9

slightly refactor cuda_paths

9c56e55

refactor libdevice search mechanism

886b9b0

debug get_cuda_paths

7ffef77

can launch kernel

fcedb13

cleanup

151e565

style

b4ededf

reset files

4f2bc2b

kkraus14 reviewed Mar 11, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall added the 3 - Ready for Review Ready for review by team label Mar 13, 2025

rwgk mentioned this pull request Mar 13, 2025

EPIC: Path finder for CUDA components NVIDIA/cuda-python#451

Open

gmarkall reviewed Mar 13, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall reviewed Mar 13, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall reviewed Mar 13, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall reviewed Mar 13, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall reviewed Mar 13, 2025

View reviewed changes

numba_cuda/numba/cuda/cuda_paths.py Outdated Show resolved Hide resolved

gmarkall requested changes Mar 13, 2025

View reviewed changes

gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Mar 13, 2025

initial ci scripts

5f4ed8f

brandon-b-miller commented Mar 18, 2025

View reviewed changes

add pynvjitlink to tests and enable

d4bf113

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 4 - Waiting on author Waiting for author to respond to review labels Mar 19, 2025

rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 19, 2025

Fetch numba-cuda/numba_cuda/numba/cuda/cuda_paths.py from NVIDIA/numb…

d31920c

…a-cuda#155 AS-IS

rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 19, 2025

Minimal changes to adapt numba-cuda/numba_cuda/numba/cuda/cuda_paths.…

0c5aca5

…py from NVIDIA/numba-cuda#155

rwgk mentioned this pull request Mar 19, 2025

[path_finder_dev] First version of cuda.bindings.path_finder NVIDIA/cuda-python#447

Merged

brandon-b-miller added 2 commits April 17, 2025 07:45

get runtime lib from wheel as well

c14644a

remove system packages from conda test jobs

9b3bad9

rwgk reviewed Apr 17, 2025

View reviewed changes

gmarkall reviewed Apr 17, 2025

View reviewed changes

brandon-b-miller added 3 commits April 17, 2025 12:52

only remove system packages in cuda 12 ci test jobs

51f7694

address reviews in cuda_paths.py

8cc37d7

add Graham's patch

d5b68a9

brandon-b-miller changed the title ~~Locate nvvm, libdevice and nvrtc from nvidia-cuda-nvcc-cu12 wheels~~ Locate nvvm, libdevice and nvrtc from nvidia-*-cu12 wheels Apr 17, 2025

brandon-b-miller changed the title ~~Locate nvvm, libdevice and nvrtc from nvidia-*-cu12 wheels~~ Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels Apr 17, 2025

source cudart from wheel

dfe25c8

rwgk approved these changes Apr 17, 2025

View reviewed changes

gmarkall approved these changes Apr 17, 2025

View reviewed changes

leofang reviewed Apr 17, 2025

View reviewed changes

leofang approved these changes Apr 17, 2025

View reviewed changes

simplify logic

9fc2531

brandon-b-miller merged commit 2da85d0 into NVIDIA:main Apr 21, 2025
37 checks passed

gmarkall mentioned this pull request Apr 22, 2025

Bump version to 0.9.0 #229

Merged

rwgk mentioned this pull request Apr 30, 2025

[path_finder_dev] 2025-05-01 version of cuda.bindings.path_finder NVIDIA/cuda-python#578

Merged

brandon-b-miller mentioned this pull request May 6, 2025

Add nvidia-cuda-{nvrtc, nvcc} as a dependency for cuDF wheels rapidsai/cudf#18686

Merged

leofang mentioned this pull request Jun 4, 2025

[BUG] numba-cuda does not support pure CTK wheel users #277

Closed

Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels #155

Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels #155

Uh oh!

Conversation

brandon-b-miller commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Mar 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZzEeKkAa commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandon-b-miller commented Mar 21, 2025

Uh oh!

brandon-b-miller commented Apr 17, 2025

Uh oh!

rwgk commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Apr 17, 2025

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarkall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkraus14 commented Apr 18, 2025

Uh oh!

ZzEeKkAa commented Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

ZzEeKkAa commented Mar 21, 2025 •

edited

Loading

rwgk commented Apr 17, 2025 •

edited

Loading