Skip to content

Releases: huggingface/kernel-builder

v0.6.2

24 Sep 15:30
Compare
Choose a tag to compare

New Features

Intel XPU support

This release of kernel-builder adds XPU support. Many thanks to @sywangyi for implementing this! You can use the xpu backend type in build.toml for XPU kernels. For example:

[kernel.activation_xpu]
backend = "xpu"
depends = ["torch"]
src = ["relu_xpu/relu.cpp"]

The ReLU example kernel shows how you can make a kernel that support CUDA, ROCm and XPU backends.

kernel-abi-check Python binding

kernel-abi-check now also has a Python binding. This will be used by the upcoming kernels check subcommand.

API changes

Prior to this version, a kernel would have to provide the Git revision to genFlakeOutputs in its flake.nix. For example:

kernel-builder.lib.genFlakeOutputs {
  path = ./.;
  rev = self.shortRev or self.dirtyShortRev or self.lastModifiedDate;
};

Starting with version 0.6.2, kernel-builder determines the revision. Instead, a kernel has to pass through the flake itself (self):

kernel-builder.lib.genFlakeOutputs {
  inherit self;
  path = ./.;
};

The old invocation of genFlakeOutputs still works with a warning, but will be deprecated in the future.

What's Changed

Full Changelog: v0.6.1...v0.6.2

v0.6.1

05 Sep 08:34
Compare
Choose a tag to compare

New Features

build-and-copy command

Before this release one had to build a kernel with nix build first and then copy the build variants from result to build. This can now be done in a single step with build-and-copy:

$ nix run .#build-and-copy -L

Automatic virtual environment for nix develop

Running nix develop in a kernel will now automatically create a virtual environment in .venv (if it does not exist) and activate it.

Docs

examples/relu-backprop-compile provides an example on how to make a kernel with backprop and torch.compile support.

What's Changed

  • Disable cachix pushes, sandboxing is not enabled by @danieldk in #198
  • [XPU]Add support for cutlass-sycl by @danieldk in #200
  • Fix handling of the 9.0a and 12.0a capabilities by @danieldk in #202
  • Update hf-nix and remove sanitiseHeaderPathsHook workaround by @danieldk in #201
  • kernel devshell: automatically create venv by @danieldk in #205
  • Move kernel flake outputs generation to a separate file by @danieldk in #207
  • Add a full ReLU example with backprop and torch.compile support by @danieldk in #206
  • Add build-and-copy package by @danieldk in #208

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Aug 20:07
Compare
Choose a tag to compare

New features

PyTorch 2.8 support

kernel-builder now supports PyTorch 2.8 in the following (upstream) build configurations:

  • CUDA 12.6, 12.8, and 12.9 on aarch64-linux and x86_64-linux.
  • ROCm 6.3 and 6.4 on x86_64-linux.
  • Metal on aarch64-darwin (macOS).

Following the kernel-builder support policy, support for Torch 2.6 is removed.

Additional compliance testing

Besides the ABI checks (manylinux and abi3 compliance), kernel-builder now also checks if the kernel can be loaded by the kernels package. This ensures, among other things, that imports are relative. This check can be an issue with some Triton kernels that use the autotune decorator, since the build sandbox does not have access to GPUs. In this case the check can be disabled by passing doGetKernelCheck = false

Support for generating PTX

When defining CUDA capabilities, it is now possible to add the +PTX suffix to generate PTX code. For example:

cuda-capabilities = [ "7.0", "8.0+PTX"]

When no CUDA capabilities are specified for a kernel, PTX is generated for capability 9.0 (and 12.0 on CUDA >= 12.8).

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.6.0

v0.5.2

04 Jul 12:11
99306a9
Compare
Choose a tag to compare

This release contains changes for handling more complex kernels:

  • Support minimum CUDA versions for 'subkernels' (kernel.<name>) for when a kernel has specializations for e.g. Blackwell.
  • Support passing custom flags to the C++ compiler.
  • Add support for building kernels for a subset of CUDA versions. Use leads to non-compliant kernels, so should only be used as a last resort.

What's Changed

  • Add cuda-maxver option to the general section by @danieldk in #170
  • build2cmake: cxx-flags option for C++ compile flags for kernels by @danieldk in #171
  • hotfix: cuda-maxver by @danieldk in #172
  • hotfix: cuda-maxver nit in Nix by @danieldk in #173
  • Add support for building for a custom set of Torch versions by @danieldk in #174
  • Add cuda-minver option for CUDA kernels by @danieldk in #176
  • Set build2cmake and kernel-abi-check to 0.5.2 for release prep by @danieldk in #177

Full Changelog: v0.5.1...v0.5.2

v0.5.1

25 Jun 08:07
965a356
Compare
Choose a tag to compare

This release contains various bugfixes.

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

23 Jun 08:16
6704ae8
Compare
Choose a tag to compare

This release adds support for building Metal kernels for Apple Silicon Macs. To accommodate non-CUDA/ROCm kernels, the build.toml format has been updated. You can update an existing build.toml using build2cmake:

$ build2cmake update-build /path/to/build.toml

You can also directly run this command with Nix:

$ nix run github:huggingface/kernel-builder/v0.5.0#update-build /path/to/build.toml

What's Changed

  • Update the build.toml format in preparation for Metal by @danieldk in #144
  • Provide better errors when deserializing build.toml by @danieldk in #145
  • feat: built root and user docker image variants by @drbh in #139
  • Add basic support for building Metal 🤘 kernels by @danieldk in #146
  • Add support for building macOS Metal kernels by @danieldk in #147
  • fix: adjust the update build command in the container by @drbh in #149
  • Enable Metal as part of bundle builds by @danieldk in #151
  • Propagate ABI check errors and fix on macOS by @danieldk in #154
  • feat: allow precompilation for metal kernels by @EricLBuehler in #152
  • build2cmake: add clean subcommand by @EricLBuehler in #156
  • kernel-abi-check: check macOS minimum version by @danieldk in #157
  • Add cutlass 3.9 as a dependency by @danieldk in #159
  • build2cmake: cuda_flags option for compile flags for CUDA kernels by @danieldk in #160
  • Update build.toml docs by @danieldk in #161
  • Accept checkInputs and nativeCheckInputs in genFlakeOutputs by @danieldk in #155
  • Small Nix documentation improvements by @danieldk in #162
  • Hotfix: append CUDA flags by @danieldk in #163
  • Set build2cmake and kernel-abi-check to 0.5.0 for release prep by @danieldk in #164

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0

28 May 08:13
fd0376f
Compare
Choose a tag to compare

What's Changed

  • Add a CUDA 12.9 build variant for Torch 2.7 by @danieldk in #136
  • feat: update docker for remote build and push by @drbh in #115
  • build2cmake: attempt to get shorthash-based ops id using git by @danieldk in #137
  • Standardizing torch_binding.cpp and torch_binding.h in the doc by @MekkCyber in #138
  • Bump nixpkgs to version with cuDNN sbsa by @danieldk in #140
  • Switch to the to-be hf-nix repo by @danieldk in #141

Full Changelog: v0.3.0...v0.4.0