Releases · huggingface/kernel-builder

This repository was archived by the owner on Jan 27, 2026. It is now read-only.

26 Nov 13:59

danieldk

v0.8.0

6c55ede

v0.8.0 Latest

Latest

New features

Support Metal 4 on macOS

kernel-builder builds Metal kernels using Metal 4 support since this release. The minimum required SDK and macOS versions are 26. For more information on how to set up a development environment, see our Metal docs.

Experimental support for Python dependencies

This version adds support for kernel Python dependencies. So far, we mostly considered kernels to be either pure PyTorch + Triton or compiled CUDA/ROCm/XPU with a small Torch wrapper. This assumption made kernels easy to deploy everywhere, since they do not have external dependencies. However, DSLs for writing kernels, such as the CUTLASS DSL, are becoming increasingly popular.

To accommodate such DSLs without bringing back the issues that dependencies have, we allow a small, curated set of dependencies. Currently the only allowed dependencies are einops and nvidia-cutlass-dsl. Dependencies can be added using the new python-depends option of the general section in build.toml:

[general]
name = "my-kernel"
# ...
python-depends = ["nvidia-cutlass-dsl"]

The dependencies are also validated by kernels when a kernel that uses dependencies is downloaded.

`build-and-upload`

A new build-and-upload command is added that builds and uploads a kernel in one go. If the kernel is not in kernels-community, you can specify the upload location in general.hub:

[general.hub]
repo-id = "my-org/my-kernel"

Flattened build directories

Thus far, kernels were stored in build/<variant>/<module_name>. This version of kernel-builder changes this to build/<variant>. This solves the issue where are kernel cannot be loaded when module_name does not match the repository name (e.g. after a rename). For the next few releases, kernel-builder will put a compatibility module at build/<variant>/<module_name> to make sure that a kernel can be loaded with an older version of kernels.

What's Changed

misc(builder): enable detection of ARM64 arch on Windows and turn on correct VS / CMake environments by @mfuntowicz in #272
fix(windows): always define _WIN32 preprocessor macro to prevent PyTorch compiling unsupported code by @mfuntowicz in #275
bug(windows): fix invalid generated build name by @mfuntowicz in #274
feat(windows): allow detecting Python executable by @mfuntowicz in #276
Missing Windows knobs to make it compatible with kernels by @mfuntowicz in #277
do not use ONEDNN_XPU_INCLUDE_DIR since it's only needed for torch2_7. by @sywangyi in #273
Add build-and-upload command by @danieldk in #278
Remove examples/activation by @danieldk in #261
fix(build2cmake): ignore untracked files when looking for modified files to suffix with _dirty by @mfuntowicz in #280
feat(windows): do not include cxx11 ABI flag when generating names by @mfuntowicz in #281
Remove duplicate build variant name code by @danieldk in #285
Add support for building CPU-only kernels by @danieldk in #284
Remove Python bytecode after checks by @danieldk in #286
Use correct Python interpreter for metallib_to_header by @danieldk in #288
Fix metal kernels support by @MekkCyber in #287
Switch to binary Torch wheels by @danieldk in #289
Update to macOS SDK 26 and Metal 4 by @danieldk in #290
Add doc on the required environment for Metal by @danieldk in #292
Also remove bytecode from universal builds by @danieldk in #294
Include CPU kernels in CI builds by @danieldk in #296
Flatten build variants to build/<variant> by @danieldk in #293
Allow dashes in kernel names by @danieldk in #297
extensionName -> moduleName by @danieldk in #298
feat: support metal cpp by @drbh in #295
Add support for (limited) Python dependencies: nvidia-cutlass-dsl and einops by @danieldk in #302
Copy over Torch from hf-nix and fix the AArch64 build by @danieldk in #304
Remove dependency on hf-nix by @danieldk in #305
fix(windows): force USE_CUDA/USE_ROCM definitions to ensure PyTorch guards are not bypassed by @mfuntowicz in #303
Fix typos by @omahs in #309
Extend cutlass to bmg by @sywangyi in #307
Update tracing-subscriber to solve dependabot issue by @danieldk in #310
gen-flake-outputs: add backendBundle output by @danieldk in #312
Add a Discord link by @danieldk in #313
Set version to 0.8.0-dev0 by @danieldk in #315

New Contributors

@omahs made their first contribution in #309

Full Changelog: v0.7.0...v0.8.0

Contributors

danieldk, mfuntowicz, and 4 other contributors

Assets 2

19 Oct 10:49

danieldk

v0.7.0

fb48e4f

v0.7.0

New features

PyTorch 2.9.0 support

kernel-builder now builds kernels for PyTorch 2.8.0 and 2.9.0 by default. Support for PyTorch 2.7.0 was removed, conforming to our policy to support the latest two releases.

Windows builder

This release contains experimental support for building Windows kernels. Since Nix is not supported on Windows, the separate PowerShell script scripts/windows/builder.ps1 is provided to build kernels on Windows.

Binary Torch wheels

kernel-builder now supports building against binary Torch wheels. This speeds up roll-out of support for new Torch versions or vendor-specific Torch builds. Build variants that do not have sourceBuild = true set will use a Torch binary wheel. We will soon switch over to using binary wheels as a default.

What's Changed

Add cutlass_4_0 as a dependency by @danieldk in #229
Add onednn support for XPU by @sywangyi in #227
README: remove Torch 2.7 note by @danieldk in #231
Append sycl flags for cutlass in 2025.2.x by @sywangyi in #233
ROCm: add rocwmma-devel by @danieldk in #236
misc(cmake): rely on the more robust Python3 module to handle Python detection by @mfuntowicz in #238
Add non-bundle torch28-cxx11-rocm64-x86_64-linux variant by @danieldk in #240
Add Torch 2.9 build variants by @danieldk in #242
fix: update docker files and readme command by @drbh in #239
ROCm: apply specified archs to sources by @danieldk in #237
Add target tiers by @drbh in #241
Update to Torch 2.9.0-rc6 by @danieldk in #245
Make separate forCache and forCacheNonBundle packages by @danieldk in #244
Update hf-nix input by @danieldk in #248
Select default shells from available variants by @danieldk in #247
Add kernels package/command to generated flake output by @danieldk in #251
Remove kernel-compliance-check by @danieldk in #253
Add ci output for generated flakes by @danieldk in #254
do not use ENV(DPCPP_VERSION) which is not friendly to local build(ne… by @sywangyi in #255
Restructure extension to cache extra ROCm and XPU dependencies by @danieldk in #257
Bump hf-nix for kernels 0.10.3 by @danieldk in #259
Fix Nix invocations in the README and Dockerfiles by @danieldk in #260
Use binary wheels for Torch by @danieldk in #252
Small comment/doc fixes by @danieldk in #263
Introducing a way to builder kernels for Windows platforms by @mfuntowicz in #250
Update hf-nix for kernels 0.10.4 by @danieldk in #265
Check that cuda-flags/sycl-flags are properly passed by @danieldk in #264
feat: Add support for ROCm/HIP flags by @shadeMe in #262
CI: sandboxed builds and enable Cachix pushes by @danieldk in #266
Hotfix CUDA devshell by switching back to source build by @danieldk in #268
Hotfix ROCm and XPU devshells by @danieldk in #269
Update to final Torch 2.9.0 by @danieldk in #270
Set version to 0.7.0-dev0 by @danieldk in #271

New Contributors

@mfuntowicz made their first contribution in #238
@shadeMe made their first contribution in #262

Full Changelog: v0.6.2...v0.7.0

Contributors

danieldk, shadeMe, and 3 other contributors

Assets 2

24 Sep 15:30

danieldk

v0.6.2

fc91c8d

v0.6.2

New Features

Intel XPU support

This release of kernel-builder adds XPU support. Many thanks to @sywangyi for implementing this! You can use the xpu backend type in build.toml for XPU kernels. For example:

[kernel.activation_xpu]
backend = "xpu"
depends = ["torch"]
src = ["relu_xpu/relu.cpp"]

The ReLU example kernel shows how you can make a kernel that support CUDA, ROCm and XPU backends.

`kernel-abi-check` Python binding

kernel-abi-check now also has a Python binding. This will be used by the upcoming kernels check subcommand.

API changes

Prior to this version, a kernel would have to provide the Git revision to genFlakeOutputs in its flake.nix. For example:

kernel-builder.lib.genFlakeOutputs {
  path = ./.;
  rev = self.shortRev or self.dirtyShortRev or self.lastModifiedDate;
};

Starting with version 0.6.2, kernel-builder determines the revision. Instead, a kernel has to pass through the flake itself (self):

kernel-builder.lib.genFlakeOutputs {
  inherit self;
  path = ./.;
};

The old invocation of genFlakeOutputs still works with a warning, but will be deprecated in the future.

What's Changed

Add XPU support by @danieldk in #210
Add xpu to the docs by @danieldk in #211
Cache build2cmake and kernel-abi-check by @danieldk in #213
Improve cutlass-sycl support by @danieldk in #214
Fix a regression in test shells by @danieldk in #217
build-and-copy: correctly get variant by @danieldk in #218
build-and-copy: copy build from the bundle output, not result/ by @danieldk in #219
Add a license by @danieldk in #220
add dnnl to the link library, some kernels need onednn by @sywangyi in #224
Add initial kernel building security guidelines by @danieldk in #216
Let kernel-builder determine the kernel revision by @danieldk in #221
hotfix: add onednn-xpu to the build inputs by @danieldk in #226
Add a Python binding for kernel-abi-check by @danieldk in #225
Add kernel-abi-check-python release workflow by @danieldk in #228

Full Changelog: v0.6.1...v0.6.2

Contributors

danieldk and sywangyi

Assets 2

05 Sep 08:34

danieldk

v0.6.1

08fcbf3

v0.6.1

New Features

`build-and-copy` command

Before this release one had to build a kernel with nix build first and then copy the build variants from result to build. This can now be done in a single step with build-and-copy:

$ nix run .#build-and-copy -L

Automatic virtual environment for `nix develop`

Running nix develop in a kernel will now automatically create a virtual environment in .venv (if it does not exist) and activate it.

Docs

examples/relu-backprop-compile provides an example on how to make a kernel with backprop and torch.compile support.

What's Changed

Disable cachix pushes, sandboxing is not enabled by @danieldk in #198
[XPU]Add support for cutlass-sycl by @danieldk in #200
Fix handling of the 9.0a and 12.0a capabilities by @danieldk in #202
Update hf-nix and remove sanitiseHeaderPathsHook workaround by @danieldk in #201
kernel devshell: automatically create venv by @danieldk in #205
Move kernel flake outputs generation to a separate file by @danieldk in #207
Add a full ReLU example with backprop and torch.compile support by @danieldk in #206
Add build-and-copy package by @danieldk in #208

Full Changelog: v0.6.0...v0.6.1

Contributors

danieldk

Assets 2

06 Aug 20:07

danieldk

v0.6.0

3b10038

v0.6.0

New features

PyTorch 2.8 support

kernel-builder now supports PyTorch 2.8 in the following (upstream) build configurations:

CUDA 12.6, 12.8, and 12.9 on aarch64-linux and x86_64-linux.
ROCm 6.3 and 6.4 on x86_64-linux.
Metal on aarch64-darwin (macOS).

Following the kernel-builder support policy, support for Torch 2.6 is removed.

Additional compliance testing

Besides the ABI checks (manylinux and abi3 compliance), kernel-builder now also checks if the kernel can be loaded by the kernels package. This ensures, among other things, that imports are relative. This check can be an issue with some Triton kernels that use the autotune decorator, since the build sandbox does not have access to GPUs. In this case the check can be disabled by passing doGetKernelCheck = false

Support for generating PTX

When defining CUDA capabilities, it is now possible to add the +PTX suffix to generate PTX code. For example:

cuda-capabilities = [ "7.0", "8.0+PTX"]

When no CUDA capabilities are specified for a kernel, PTX is generated for capability 9.0 (and 12.0 on CUDA >= 12.8).

What's Changed

feat: include cachix instructions in readme by @drbh in #178
Add check that imports the kernel with kernels by @danieldk in #179
Make get-kernel-check work on macOS by @danieldk in #180
Embed compiled metal kernels into binary by @EricLBuehler in #181
Make hipify_sources_target work with multiple include dirs by @danieldk in #183
Add doGetKernelCheck option to genFlakeOutputs by @danieldk in #188
Set HOME in get-kernel-check-hook by @danieldk in #189
CI: try to build a macOS kernel by @danieldk in #184
kernel-abi-check: improve description by @danieldk in #190
add xpu build support by @sywangyi in #185
feat: Docker build with buildx for cross compilation by @drbh in #191
Pass default versions set torchVersions by @danieldk in #193
Support +PTX in cuda capabilities by @danieldk in #196
Add Torch 2.8, remove Torch 2.6 by @danieldk in #182

New Contributors

@sywangyi made their first contribution in #185

Full Changelog: v0.5.2...v0.6.0

Contributors

danieldk, drbh, and 2 other contributors

Assets 2

04 Jul 12:11

danieldk

v0.5.2

99306a9

v0.5.2

This release contains changes for handling more complex kernels:

Support minimum CUDA versions for 'subkernels' (kernel.<name>) for when a kernel has specializations for e.g. Blackwell.
Support passing custom flags to the C++ compiler.
Add support for building kernels for a subset of CUDA versions. Use leads to non-compliant kernels, so should only be used as a last resort.

What's Changed

Add cuda-maxver option to the general section by @danieldk in #170
build2cmake: cxx-flags option for C++ compile flags for kernels by @danieldk in #171
hotfix: cuda-maxver by @danieldk in #172
hotfix: cuda-maxver nit in Nix by @danieldk in #173
Add support for building for a custom set of Torch versions by @danieldk in #174
Add cuda-minver option for CUDA kernels by @danieldk in #176
Set build2cmake and kernel-abi-check to 0.5.2 for release prep by @danieldk in #177

Full Changelog: v0.5.1...v0.5.2

Contributors

danieldk

Assets 2

25 Jun 08:07

danieldk

v0.5.1

965a356

v0.5.1

This release contains various bugfixes.

What's Changed

Add cuda-minver option to the general section by @danieldk in #165
Darwin: rewrite Nix store paths by @danieldk in #167
build2cmake: remove metallib install by @EricLBuehler in #168
Dockerfile improve local path by @drbh in #166
Set build2cmake and kernel-abi-check to 0.5.1 for release prep by @danieldk in #169

Full Changelog: v0.5.0...v0.5.1

Contributors

danieldk, drbh, and EricLBuehler

Assets 2

23 Jun 08:16

danieldk

v0.5.0

6704ae8

v0.5.0

This release adds support for building Metal kernels for Apple Silicon Macs. To accommodate non-CUDA/ROCm kernels, the build.toml format has been updated. You can update an existing build.toml using build2cmake:

$ build2cmake update-build /path/to/build.toml

You can also directly run this command with Nix:

$ nix run github:huggingface/kernel-builder/v0.5.0#update-build /path/to/build.toml

What's Changed

Update the build.toml format in preparation for Metal by @danieldk in #144
Provide better errors when deserializing build.toml by @danieldk in #145
feat: built root and user docker image variants by @drbh in #139
Add basic support for building Metal 🤘 kernels by @danieldk in #146
Add support for building macOS Metal kernels by @danieldk in #147
fix: adjust the update build command in the container by @drbh in #149
Enable Metal as part of bundle builds by @danieldk in #151
Propagate ABI check errors and fix on macOS by @danieldk in #154
feat: allow precompilation for metal kernels by @EricLBuehler in #152
build2cmake: add clean subcommand by @EricLBuehler in #156
kernel-abi-check: check macOS minimum version by @danieldk in #157
Add cutlass 3.9 as a dependency by @danieldk in #159
build2cmake: cuda_flags option for compile flags for CUDA kernels by @danieldk in #160
Update build.toml docs by @danieldk in #161
Accept checkInputs and nativeCheckInputs in genFlakeOutputs by @danieldk in #155
Small Nix documentation improvements by @danieldk in #162
Hotfix: append CUDA flags by @danieldk in #163
Set build2cmake and kernel-abi-check to 0.5.0 for release prep by @danieldk in #164

New Contributors

@EricLBuehler made their first contribution in #152

Full Changelog: v0.4.0...v0.5.0

Contributors

danieldk, drbh, and EricLBuehler

Assets 2

28 May 08:13

danieldk

v0.4.0

fd0376f

v0.4.0

What's Changed

Add a CUDA 12.9 build variant for Torch 2.7 by @danieldk in #136
feat: update docker for remote build and push by @drbh in #115
build2cmake: attempt to get shorthash-based ops id using git by @danieldk in #137
Standardizing torch_binding.cpp and torch_binding.h in the doc by @MekkCyber in #138
Bump nixpkgs to version with cuDNN sbsa by @danieldk in #140
Switch to the to-be hf-nix repo by @danieldk in #141

Full Changelog: v0.3.0...v0.4.0

Contributors

danieldk, drbh, and MekkCyber

Assets 2

Releases: huggingface/kernel-builder

v0.8.0

New features

Support Metal 4 on macOS

Experimental support for Python dependencies

build-and-upload

Flattened build directories

What's Changed

New Contributors

Contributors

Uh oh!

v0.7.0

New features

PyTorch 2.9.0 support

Windows builder

Binary Torch wheels

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.2

New Features

Intel XPU support

kernel-abi-check Python binding

API changes

What's Changed

Contributors

Uh oh!

v0.6.1

New Features

build-and-copy command

Automatic virtual environment for nix develop

Docs

What's Changed

Contributors

Uh oh!

v0.6.0

New features

PyTorch 2.8 support

Additional compliance testing

Support for generating PTX

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.2

What's Changed

Contributors

Uh oh!

v0.5.1

What's Changed

Contributors

Uh oh!

v0.5.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.0

What's Changed

Contributors

Uh oh!

`build-and-upload`

`kernel-abi-check` Python binding

`build-and-copy` command

Automatic virtual environment for `nix develop`