fix candle-kernels build for CC < 700 #3300

haricot · 2026-01-12T11:02:32Z

I used cudarc driver API to automatically detect compute capabilities at build time, which seems more practical than relying on the CUDA_COMPUTE_CAP environment variable:

Works out-of-the-box without user configuration
Automatically detects multi-GPU setups
Falls back to CUDA_COMPUTE_CAP env var if driver init fails (e.g., in CI)

If you prefer a different approach (e.g., nvidia-smi or env var only), I'm happy to adjust.

Currently, the generator's compute_cap method depends on the merging of Narsil/bindgen_cuda#18. And if Narsil/bindgen_cuda#16 is merged, it would be possible to extend CUBIN generation to multiple architectures to accelerate startup and optimization.

haricot added 4 commits January 12, 2026 09:42

fix candle-kernels build for CC < 700

016166e

fix candle-kernels related bindgen_cuda/pull/18

0bf4169

candle-kernels/Cargo.toml delete unneeded comments

8966e59

candle-kernels/Cargo.toml restore formatting

ba56fc9

haricot changed the title ~~fix candle-kernels build for CC < 700 (depends merging Narsil/bindgen_cuda#18)~~ fix candle-kernels build for CC < 700 Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix candle-kernels build for CC < 700 #3300

fix candle-kernels build for CC < 700 #3300

Uh oh!

haricot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix candle-kernels build for CC < 700 #3300

Are you sure you want to change the base?

fix candle-kernels build for CC < 700 #3300

Uh oh!

Conversation

haricot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant