Releases: woct0rdho/triton-windows
v3.4.0-windows.post20
Note again that Triton 3.4 only works with PyTorch >= 2.8.
To install Triton 3.4 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.5:
pip install -U "triton-windows<3.5"
v3.3.1-windows.post19
This is identical with triton-windows 3.3.0.post19
, but I bump the version number to match the official one.
The only difference between the official triton 3.3.0
and 3.3.1
is triton-lang#6771 , which affects RTX 50xx GPUs. I've already added this patch since triton-windows 3.3.0.post14
.
v3.3.0-windows.post19
- Fix JIT compilation using Clang
Note again that Triton 3.3 only works with PyTorch >= 2.7, and Triton 3.2 only works with PyTorch >= 2.6 .
To install Triton 3.3 and prevent breaking with your installed PyTorch when a new version of Triton is released in future, you need to limit the version to be < 3.4:
pip install -U "triton-windows<3.4"
v3.2.0-windows.post18
- Find MSVC and Windows SDK from environment variables set by
Launch-VsDevShell.ps1
orVsDevCmd.bat
, see #106 - Print
cc_cmd
for debugging when failed to compile
empty
Here are some empty wheels named triton
. You can add them to your build system if it tells you that some package requires triton
rather than triton-windows
, and also add triton-windows
to the build system.
You may use transient-package to create such packages.
v3.2.0-windows.post17
Fix when multiple processes create __triton_launcher.pyd
in parallel, see intel/intel-xpu-backend-for-triton#3270 . Now torch.compile
autotune will work in general.
Note that ComfyUI enables cudaMalloc by default, but cudaMalloc does not work with CUDA graphs. Also, many models and nodes in ComfyUI are not compatible with CUDA graphs. You may use mode='max-autotune-no-cudagraphs'
and see if it has speedup.
v3.2.0-windows.post16
Ensure temp files are closed when calling ptxas
in parallel. I still need to investigate some bugs in PyTorch to make torch.compile
autotune fully work, see unslothai/unsloth#1999
v3.2.0-windows.post15
Define Py_LIMITED_API
and exclude new Python C API that cannot be compiled by TinyCC, see #92
v3.3.0-windows.post14
Fix getMMAVersionSafe
for RTX 50xx (sm120), see #83 (comment)
v3.2.0-windows.post13
TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit
, such as SageAttention, will just work.
You still need to install a C++ compiler if you use torch.compile
targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.