-
Notifications
You must be signed in to change notification settings - Fork 696
Description
CCCL (CUDA Core Compute Libraries, https://github.com/NVIDIA/cccl) is a header-only library that is heavily used by XLA and dependencies like RAPIDS. It includes widely-used components for CUDA programming like Thrust, CUB, and libcudacxx.
I am filing this issue with a proposal to improve XLA's compatibility with multiple CUDA Toolkit versions.
Today, it seems that XLA is getting its CCCL from the CUDA Toolkit. CCCL is a special dependency. It is header-only, and it is shipped with the CUDA Toolkit - but it is also available from GitHub. CCCL recommends getting its sources from GitHub for maximum compatibility. This came up because the dependencies on RAPIDS libraries like RMM and RAFT are difficult to satisfy when using CCCL from the CUDA Toolkit, because RAPIDS typically requires a recent CCCL version and this means that older CUDA Toolkits with older CCCL versions are incompatible. For example, RAPIDS 25.12 requires CCCL 3.1, which is publicly available on GitHub but is not yet released in a public CUDA Toolkit. If XLA can get CCCL from source, it is backwards compatible with older CUDA Toolkits, and would allow XLA to compile with CUDA versions from both 12.x and 13.x major versions. (See compatibility rules: https://github.com/NVIDIA/cccl?tab=readme-ov-file#cuda-toolkit-ctk-compatibility)
I propose that XLA should add CCCL as a third-party dependency and fetch it from GitHub rather than implicitly using the CUDA Toolkit's version. The benefits would be that the project can use a wider range of CUDA Toolkit versions, and would decouple the tight linkage between the CUDA Toolkit version and the supported RAPIDS versions, making it easier to keep things up to date across the board and migrate CUDA-related components more independently. I help with CCCL packaging and maintain several libraries using CCCL, and this approach is how we advocate for most projects using CCCL should be structured.
@Artem-B noted in a Google Chat discussion,
For external builds, switching to upstream CCCL should be feasible, as long as we keep OpenXLA buildable with v2.8, too, at least until we figure out how to transition internal builds to v3.1.
I think this should help with that migration endeavor, because using CCCL from GitHub would mean that you can match CCCL versions to the supported RAPIDS versions, and then build with any compatible CUDA Toolkit rather than pinning to the exact CUDA Toolkit that shipped the desired CCCL version. In the current code, it seems like the set of compatible RAPIDS versions would be tightly constrained (probably to a single version) by the chosen CUDA Toolkit.
See also: NVIDIA/cccl#6540