Releases: huggingface/kernel-builder
v0.6.2
New Features
Intel XPU support
This release of kernel-builder adds XPU support. Many thanks to @sywangyi for implementing this! You can use the xpu
backend type in build.toml
for XPU kernels. For example:
[kernel.activation_xpu]
backend = "xpu"
depends = ["torch"]
src = ["relu_xpu/relu.cpp"]
The ReLU example kernel shows how you can make a kernel that support CUDA, ROCm and XPU backends.
kernel-abi-check
Python binding
kernel-abi-check
now also has a Python binding. This will be used by the upcoming kernels check
subcommand.
API changes
Prior to this version, a kernel would have to provide the Git revision to genFlakeOutputs
in its flake.nix
. For example:
kernel-builder.lib.genFlakeOutputs {
path = ./.;
rev = self.shortRev or self.dirtyShortRev or self.lastModifiedDate;
};
Starting with version 0.6.2, kernel-builder determines the revision. Instead, a kernel has to pass through the flake itself (self
):
kernel-builder.lib.genFlakeOutputs {
inherit self;
path = ./.;
};
The old invocation of genFlakeOutputs
still works with a warning, but will be deprecated in the future.
What's Changed
- Add XPU support by @danieldk in #210
- Add
xpu
to the docs by @danieldk in #211 - Cache build2cmake and kernel-abi-check by @danieldk in #213
- Improve cutlass-sycl support by @danieldk in #214
- Fix a regression in test shells by @danieldk in #217
- build-and-copy: correctly get variant by @danieldk in #218
- build-and-copy: copy build from the bundle output, not
result/
by @danieldk in #219 - Add a license by @danieldk in #220
- add dnnl to the link library, some kernels need onednn by @sywangyi in #224
- Add initial kernel building security guidelines by @danieldk in #216
- Let kernel-builder determine the kernel revision by @danieldk in #221
- hotfix: add
onednn-xpu
to the build inputs by @danieldk in #226 - Add a Python binding for kernel-abi-check by @danieldk in #225
- Add kernel-abi-check-python release workflow by @danieldk in #228
Full Changelog: v0.6.1...v0.6.2
v0.6.1
New Features
build-and-copy
command
Before this release one had to build a kernel with nix build
first and then copy the build variants from result
to build
. This can now be done in a single step with build-and-copy
:
$ nix run .#build-and-copy -L
Automatic virtual environment for nix develop
Running nix develop
in a kernel will now automatically create a virtual environment in .venv
(if it does not exist) and activate it.
Docs
examples/relu-backprop-compile
provides an example on how to make a kernel with backprop and torch.compile
support.
What's Changed
- Disable cachix pushes, sandboxing is not enabled by @danieldk in #198
- [XPU]Add support for cutlass-sycl by @danieldk in #200
- Fix handling of the 9.0a and 12.0a capabilities by @danieldk in #202
- Update hf-nix and remove
sanitiseHeaderPathsHook
workaround by @danieldk in #201 - kernel devshell: automatically create venv by @danieldk in #205
- Move kernel flake outputs generation to a separate file by @danieldk in #207
- Add a full ReLU example with backprop and
torch.compile
support by @danieldk in #206 - Add build-and-copy package by @danieldk in #208
Full Changelog: v0.6.0...v0.6.1
v0.6.0
New features
PyTorch 2.8 support
kernel-builder now supports PyTorch 2.8 in the following (upstream) build configurations:
- CUDA 12.6, 12.8, and 12.9 on aarch64-linux and x86_64-linux.
- ROCm 6.3 and 6.4 on x86_64-linux.
- Metal on aarch64-darwin (macOS).
Following the kernel-builder support policy, support for Torch 2.6 is removed.
Additional compliance testing
Besides the ABI checks (manylinux and abi3 compliance), kernel-builder now also checks if the kernel can be loaded by the kernels
package. This ensures, among other things, that imports are relative. This check can be an issue with some Triton kernels that use the autotune decorator, since the build sandbox does not have access to GPUs. In this case the check can be disabled by passing doGetKernelCheck = false
Support for generating PTX
When defining CUDA capabilities, it is now possible to add the +PTX
suffix to generate PTX code. For example:
cuda-capabilities = [ "7.0", "8.0+PTX"]
When no CUDA capabilities are specified for a kernel, PTX is generated for capability 9.0 (and 12.0 on CUDA >= 12.8).
What's Changed
- feat: include cachix instructions in readme by @drbh in #178
- Add check that imports the kernel with
kernels
by @danieldk in #179 - Make get-kernel-check work on macOS by @danieldk in #180
- Embed compiled metal kernels into binary by @EricLBuehler in #181
- Make
hipify_sources_target
work with multiple include dirs by @danieldk in #183 - Add
doGetKernelCheck
option togenFlakeOutputs
by @danieldk in #188 - Set
HOME
in get-kernel-check-hook by @danieldk in #189 - CI: try to build a macOS kernel by @danieldk in #184
- kernel-abi-check: improve description by @danieldk in #190
- add xpu build support by @sywangyi in #185
- feat: Docker build with buildx for cross compilation by @drbh in #191
- Pass default versions set
torchVersions
by @danieldk in #193 - Support
+PTX
in cuda capabilities by @danieldk in #196 - Add Torch 2.8, remove Torch 2.6 by @danieldk in #182
New Contributors
Full Changelog: v0.5.2...v0.6.0
v0.5.2
This release contains changes for handling more complex kernels:
- Support minimum CUDA versions for 'subkernels' (
kernel.<name>
) for when a kernel has specializations for e.g. Blackwell. - Support passing custom flags to the C++ compiler.
- Add support for building kernels for a subset of CUDA versions. Use leads to non-compliant kernels, so should only be used as a last resort.
What's Changed
- Add
cuda-maxver
option to thegeneral
section by @danieldk in #170 - build2cmake:
cxx-flags
option for C++ compile flags for kernels by @danieldk in #171 - hotfix:
cuda-maxver
by @danieldk in #172 - hotfix: cuda-maxver nit in Nix by @danieldk in #173
- Add support for building for a custom set of Torch versions by @danieldk in #174
- Add
cuda-minver
option for CUDA kernels by @danieldk in #176 - Set build2cmake and kernel-abi-check to 0.5.2 for release prep by @danieldk in #177
Full Changelog: v0.5.1...v0.5.2
v0.5.1
This release contains various bugfixes.
What's Changed
- Add
cuda-minver
option to thegeneral
section by @danieldk in #165 - Darwin: rewrite Nix store paths by @danieldk in #167
- build2cmake: remove metallib install by @EricLBuehler in #168
- Dockerfile improve local path by @drbh in #166
- Set build2cmake and kernel-abi-check to 0.5.1 for release prep by @danieldk in #169
Full Changelog: v0.5.0...v0.5.1
v0.5.0
This release adds support for building Metal kernels for Apple Silicon Macs. To accommodate non-CUDA/ROCm kernels, the build.toml
format has been updated. You can update an existing build.toml
using build2cmake
:
$ build2cmake update-build /path/to/build.toml
You can also directly run this command with Nix:
$ nix run github:huggingface/kernel-builder/v0.5.0#update-build /path/to/build.toml
What's Changed
- Update the build.toml format in preparation for Metal by @danieldk in #144
- Provide better errors when deserializing
build.toml
by @danieldk in #145 - feat: built root and user docker image variants by @drbh in #139
- Add basic support for building Metal 🤘 kernels by @danieldk in #146
- Add support for building macOS Metal kernels by @danieldk in #147
- fix: adjust the update build command in the container by @drbh in #149
- Enable Metal as part of bundle builds by @danieldk in #151
- Propagate ABI check errors and fix on macOS by @danieldk in #154
- feat: allow precompilation for metal kernels by @EricLBuehler in #152
- build2cmake: add clean subcommand by @EricLBuehler in #156
- kernel-abi-check: check macOS minimum version by @danieldk in #157
- Add cutlass 3.9 as a dependency by @danieldk in #159
- build2cmake: cuda_flags option for compile flags for CUDA kernels by @danieldk in #160
- Update build.toml docs by @danieldk in #161
- Accept
checkInputs
andnativeCheckInputs
ingenFlakeOutputs
by @danieldk in #155 - Small Nix documentation improvements by @danieldk in #162
- Hotfix: append CUDA flags by @danieldk in #163
- Set build2cmake and kernel-abi-check to 0.5.0 for release prep by @danieldk in #164
New Contributors
- @EricLBuehler made their first contribution in #152
Full Changelog: v0.4.0...v0.5.0
v0.4.0
What's Changed
- Add a CUDA 12.9 build variant for Torch 2.7 by @danieldk in #136
- feat: update docker for remote build and push by @drbh in #115
- build2cmake: attempt to get shorthash-based ops id using git by @danieldk in #137
- Standardizing torch_binding.cpp and torch_binding.h in the doc by @MekkCyber in #138
- Bump nixpkgs to version with cuDNN sbsa by @danieldk in #140
- Switch to the to-be hf-nix repo by @danieldk in #141
Full Changelog: v0.3.0...v0.4.0