Skip to content

Conversation

@haricot
Copy link
Contributor

@haricot haricot commented Jan 12, 2026

I used cudarc driver API to automatically detect compute capabilities at build time, which seems more practical than relying on the CUDA_COMPUTE_CAP environment variable:

  • Works out-of-the-box without user configuration
  • Automatically detects multi-GPU setups
  • Falls back to CUDA_COMPUTE_CAP env var if driver init fails (e.g., in CI)

If you prefer a different approach (e.g., nvidia-smi or env var only), I'm happy to adjust.

Currently, the generator's compute_cap method depends on the merging of Narsil/bindgen_cuda#18. And if Narsil/bindgen_cuda#16 is merged, it would be possible to extend CUBIN generation to multiple architectures to accelerate startup and optimization.

@haricot haricot changed the title fix candle-kernels build for CC < 700 (depends merging Narsil/bindgen_cuda#18) fix candle-kernels build for CC < 700 Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant