Skip to content

Releases: woct0rdho/triton-windows

v3.4.0-windows.post20

31 Jul 02:22
Compare
Choose a tag to compare

Note again that Triton 3.4 only works with PyTorch >= 2.8.

To install Triton 3.4 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.5:

pip install -U "triton-windows<3.5"

v3.3.1-windows.post19

30 May 15:17
Compare
Choose a tag to compare

This is identical with triton-windows 3.3.0.post19, but I bump the version number to match the official one.

The only difference between the official triton 3.3.0 and 3.3.1 is triton-lang#6771 , which affects RTX 50xx GPUs. I've already added this patch since triton-windows 3.3.0.post14.

v3.3.0-windows.post19

24 Apr 07:04
Compare
Choose a tag to compare
  • Fix JIT compilation using Clang

Note again that Triton 3.3 only works with PyTorch >= 2.7, and Triton 3.2 only works with PyTorch >= 2.6 .

To install Triton 3.3 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.4:

pip install -U "triton-windows<3.4"

v3.2.0-windows.post18

18 Apr 02:45
Compare
Choose a tag to compare
  • Find MSVC and Windows SDK from environment variables set by Launch-VsDevShell.ps1 or VsDevCmd.bat, see #106
  • Print cc_cmd for debugging when failed to compile

empty

25 Mar 13:41
b689470
Compare
Choose a tag to compare
empty Pre-release
Pre-release

Here are some empty wheels named triton. You can add them to your build system if it tells you that some package requires triton rather than triton-windows, and also add triton-windows to the build system.

You may use transient-package to create such packages.

v3.2.0-windows.post17

20 Mar 12:26
Compare
Choose a tag to compare

Fix when multiple processes create __triton_launcher.pyd in parallel, see intel/intel-xpu-backend-for-triton#3270 . Now torch.compile autotune will work in general.

Note that ComfyUI enables cudaMalloc by default, but cudaMalloc does not work with CUDA graphs. Also, many models and nodes in ComfyUI are not compatible with CUDA graphs. You may use mode='max-autotune-no-cudagraphs' and see if it has speedup.

v3.2.0-windows.post16

20 Mar 05:55
Compare
Choose a tag to compare

Ensure temp files are closed when calling ptxas in parallel. I still need to investigate some bugs in PyTorch to make torch.compile autotune fully work, see unslothai/unsloth#1999

v3.2.0-windows.post15

16 Mar 15:35
Compare
Choose a tag to compare

Define Py_LIMITED_API and exclude new Python C API that cannot be compiled by TinyCC, see #92

v3.3.0-windows.post14

15 Mar 12:38
Compare
Choose a tag to compare
v3.3.0-windows.post14 Pre-release
Pre-release

Fix getMMAVersionSafe for RTX 50xx (sm120), see #83 (comment)

v3.2.0-windows.post13

12 Mar 13:07
Compare
Choose a tag to compare

TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit, such as SageAttention, will just work.

You still need to install a C++ compiler if you use torch.compile targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.