-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Describe the issue
Following PR #26002, certain GPU architectures were removed due to the overall wheel size exceeding GitHub and PyPI limits.
However, newer versions of nvcc
(introduced since CUDA 12.8) support fatbin
compression (-Xfatbin=-compress-all -compress-mode=MODE
), which can significantly reduce binary size without affecting functionality.
Below are my test results comparing different compression modes (size
, balance
, speed
) under both CUDA 12.8 and CUDA 13.0.
default
indicates the mode automatically applied by the corresponding nvcc
version
Note: enabling --compress-mode
requires a driver version at least as new as the one shipped with CUDA 12.4, which is why PyTorch
only enables it for wheels built with CUDA 13.0 and later.
CUDA Ver | Mode | Wheel Size (MB) | Compile Time (min:s) |
---|---|---|---|
13.0 | Speed | 923.9 | 187m51.531s |
13.0 | Balance (Default) | 516.1 | 179m6.947s |
13.0 | Size | 360.0 | 191m42.614s |
12.8 | Speed (Default) | 689.5 | 159m1.095s |
12.8 | Balance | 435.5 | 185m53.233s |
12.8 | Size | 309.3 | 182m14.164s |
Build Script
First replace set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xfatbin=-compress-all")
with set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xfatbin=-compress-all -compress-mode=YOUR_MODE")
docker run -it -v ~/onnxruntime:/root/onnxruntime --name ort continuumio/miniconda3
cd ~/onnxruntime
conda create -n onnx python=3.12 -y && conda activate onnx
pip install -r requirements.txt
conda install cuda-nvcc=13.0/12.8 cuda-toolkit cudnn -c nvidia -y
conda install gcc gxx cmake ninja -c conda-forge
ln -s /opt/conda/envs/onnx/targets/sbsa-linux/include/cccl/cuda /opt/conda/envs/onnx/targets/sbsa-linux/include/cuda
apt update && apt-get install -y patch
export CC=gcc CXX=g++
bash build.sh \
--config Release \
--build_shared_lib \
--cmake_generator Ninja \
--parallel 6 \
--nvcc_threads 1 \
--use_cuda \
--cuda_version 13.0/12.8 \
--cuda_home $CONDA_PREFIX \
--cudnn_home $CONDA_PREFIX \
--build_wheel \
--skip_tests \
--cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF \
--allow_running_as_root \
--compile_no_warning_as_error
Platform
Linux
ONNX Runtime Installation
Built from Source
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
CUDA