fix: undefined symbol cudaGetDriverEntryPointByVersion with CUDA >= 12.5 (flashinfer-ai#928)

zobinHuang · web-flow · commit 29be596d8ce4 · 2025-03-10T15:00:16.000-07:00
## Problem: When ① build flashinfer with CUDA >= 12.5 (using system-wide CUDA toolkit under `/usr/local/cuda`), and ② run with CUDA < 12.5 (using `libcudart.so` under the python environment `/usr/local/lib/python3.10/dist-packages/nvidia/cuda_runtime/lib/libcudart.so.12`), one would meet the issue of undefined symbol `cudaGetDriverEntryPointByVersion`, which is introduced since CUDA 12.5. <img width="824" alt="image" src="https://github.com/user-attachments/assets/30322352-2cdc-45b5-adc3-2eb82fbac45e" /> This issue has been reported and fixed in other projects: - cutlass: NVIDIA/cutlass#2086 - sglang: sgl-project/sglang#3372 ## Fix This fix is a workaround of this issue which forces flashinfer use system-wide CUDA toolkit, refer to the fix in [sglang](sgl-project/sglang#3372), cc @zhyncs.
diff --git a/flashinfer/jit/__init__.py b/flashinfer/jit/__init__.py
@@ -52,6 +52,14 @@
 from .env import *
 from .utils import parallel_load_modules as parallel_load_modules
 
+
+import os
+import ctypes
+cuda_lib_path = os.environ.get('CUDA_LIB_PATH', '/usr/local/cuda/targets/x86_64-linux/lib/')
+if os.path.exists(f"{cuda_lib_path}/libcudart.so.12"):
+    ctypes.CDLL(f"{cuda_lib_path}/libcudart.so.12", mode=ctypes.RTLD_GLOBAL)
+
+
 try:
     from .. import flashinfer_kernels, flashinfer_kernels_sm90  # noqa: F401
     from .aot_config import prebuilt_ops_uri as prebuilt_ops_uri