diff --git a/CHANGELOG.md b/CHANGELOG.md index 5264c94e7..ebb29fa9f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,18 @@ # TensorRT OSS Release Changelog +## 10.2.0 GA - 2024-07-10 + +Key Features and Updates: + + - Demo changes + - Added [Stable Diffusion 3 demo](demo/Diffusion). + - Plugin changes + - Version 3 of the [InstanceNormalization plugin](plugin/instanceNormalizationPlugin/) (`InstanceNormalization_TRT`) has been added. This version is based on the `IPluginV3` interface and is used by the TensorRT ONNX parser when native `InstanceNormalization` is disabled. + - Tooling changes + - Pytorch Quantization development has transitioned to [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer). All developers are encouraged to use TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression. + - Build containers + - Updated default cuda versions to `12.5.0`. + ## 10.1.0 GA - 2024-06-17 Key Features and Updates: diff --git a/README.md b/README.md index f7835016a..991048b96 100644 --- a/README.md +++ b/README.md @@ -26,13 +26,13 @@ You can skip the **Build** section to enjoy TensorRT with Python. To build the TensorRT-OSS components, you will first need the following software packages. **TensorRT GA build** -* TensorRT v10.1.0.27 +* TensorRT v10.2.0.19 * Available from direct download links listed below **System Packages** * [CUDA](https://developer.nvidia.com/cuda-toolkit) * Recommended versions: - * cuda-12.4.0 + cuDNN-8.9 + * cuda-12.5.0 + cuDNN-8.9 * cuda-11.8.0 + cuDNN-8.9 * [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1 * [cmake](https://github.com/Kitware/CMake/releases) >= v3.13 @@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step. Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below: - - [TensorRT 10.1.0.27 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz) - - [TensorRT 10.1.0.27 for CUDA 12.4, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz) - - [TensorRT 10.1.0.27 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/zip/TensorRT-10.1.0.27.Windows.win10.cuda-11.8.zip) - - [TensorRT 10.1.0.27 for CUDA 12.4, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/zip/TensorRT-10.1.0.27.Windows.win10.cuda-12.4.zip) + - [TensorRT 10.2.0.19 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz) + - [TensorRT 10.2.0.19 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz) + - [TensorRT 10.2.0.19 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-11.8.zip) + - [TensorRT 10.2.0.19 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip) - **Example: Ubuntu 20.04 on x86-64 with cuda-12.4** + **Example: Ubuntu 20.04 on x86-64 with cuda-12.5** ```bash cd ~/Downloads - tar -xvzf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz - export TRT_LIBPATH=`pwd`/TensorRT-10.1.0.27 + tar -xvzf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz + export TRT_LIBPATH=`pwd`/TensorRT-10.2.0.19 ``` - **Example: Windows on x86-64 with cuda-12.4** + **Example: Windows on x86-64 with cuda-12.5** ```powershell - Expand-Archive -Path TensorRT-10.1.0.27.Windows.win10.cuda-12.4.zip - $env:TRT_LIBPATH="$pwd\TensorRT-10.1.0.27\lib" + Expand-Archive -Path TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip + $env:TRT_LIBPATH="$pwd\TensorRT-10.2.0.19\lib" ``` ## Setting Up The Build Environment @@ -101,27 +101,27 @@ For Linux platforms, we recommend that you generate a docker container for build 1. #### Generate the TensorRT-OSS build container. The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box. - **Example: Ubuntu 20.04 on x86-64 with cuda-12.4 (default)** + **Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)** ```bash - ./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.4 + ./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5 ``` - **Example: Rockylinux8 on x86-64 with cuda-12.4** + **Example: Rockylinux8 on x86-64 with cuda-12.5** ```bash - ./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.4 + ./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.5 ``` - **Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.4 (JetPack SDK)** + **Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.5 (JetPack SDK)** ```bash - ./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.4 + ./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.5 ``` - **Example: Ubuntu 22.04 on aarch64 with cuda-12.4** + **Example: Ubuntu 22.04 on aarch64 with cuda-12.5** ```bash - ./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.4 + ./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.5 ``` 2. #### Launch the TensorRT-OSS build container. **Example: Ubuntu 20.04 build container** ```bash - ./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.4 --gpus all + ./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all ``` > NOTE:
1. Use the `--tag` corresponding to build container generated in Step 1. @@ -132,38 +132,38 @@ For Linux platforms, we recommend that you generate a docker container for build ## Building TensorRT-OSS * Generate Makefiles and build. - **Example: Linux (x86-64) build with default cuda-12.4** + **Example: Linux (x86-64) build with default cuda-12.5** ```bash cd $TRT_OSSPATH mkdir -p build && cd build cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out make -j$(nproc) ``` - **Example: Linux (aarch64) build with default cuda-12.4** + **Example: Linux (aarch64) build with default cuda-12.5** ```bash cd $TRT_OSSPATH mkdir -p build && cd build cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain make -j$(nproc) ``` - **Example: Native build on Jetson (aarch64) with cuda-12.4** + **Example: Native build on Jetson (aarch64) with cuda-12.5** ```bash cd $TRT_OSSPATH mkdir -p build && cd build - cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.4 + cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.5 CC=/usr/bin/gcc make -j$(nproc) ``` > NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf. - **Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.4 (JetPack)** + **Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.5 (JetPack)** ```bash cd $TRT_OSSPATH mkdir -p build && cd build - cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.4 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.4/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.4/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib + cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.5 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib make -j$(nproc) ``` - **Example: Native builds on Windows (x86) with cuda-12.4** + **Example: Native builds on Windows (x86) with cuda-12.5** ```powershell cd $TRT_OSSPATH mkdir -p build diff --git a/VERSION b/VERSION index 1bf602d7b..0afb8eb4b 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -10.1.0.27 +10.2.0.19 diff --git a/demo/BERT/README.md b/demo/BERT/README.md index a814652d7..e2c53c197 100755 --- a/demo/BERT/README.md +++ b/demo/BERT/README.md @@ -30,7 +30,8 @@ This subfolder of the BERT TensorFlow repository, tested and maintained by NVIDI * [TensorRT inference benchmark](#tensorrt-inference-benchmark) * [Results](#results) * [Inference performance: NVIDIA A100](#inference-performance-nvidia-a100-40gb) - * [Inference performance: NVIDIA A30](#inference-performance-nvidia-a30) + * [Inference performance: NVIDIA L4](#inference-performance-nvidia-l4) + * [Inference performance: NVIDIA L40S](#inference-performance-nvidia-l40s) ## Model overview @@ -74,8 +75,8 @@ The following software version configuration has been tested: |Software|Version| |--------|-------| |Python|>=3.8| -|TensorRT|10.1.0.27| -|CUDA|12.4| +|TensorRT|10.2.0.19| +|CUDA|12.5| ## Setup diff --git a/demo/DeBERTa/README.md b/demo/DeBERTa/README.md index 202ba16f0..462084a45 100644 --- a/demo/DeBERTa/README.md +++ b/demo/DeBERTa/README.md @@ -75,7 +75,7 @@ Note that the performance gap between BERT's self-attention and DeBERTa's disent ## Environment Setup It is recommended to use docker for reproducing the following steps. Follow the setup steps in TensorRT OSS [README](https://github.com/NVIDIA/TensorRT#setting-up-the-build-environment) to build and launch the container and build OSS: -**Example: Ubuntu 20.04 on x86-64 with cuda-12.4 (default)** +**Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)** ```bash # Download this TensorRT OSS repo git clone -b main https://github.com/nvidia/TensorRT TensorRT @@ -84,10 +84,10 @@ git submodule update --init --recursive ## at root of TensorRT OSS # build container -./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.4 +./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5 # launch container -./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.4 --gpus all +./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all ## now inside container # build OSS (only required for pre-8.4.3 TensorRT versions) diff --git a/demo/Diffusion/README.md b/demo/Diffusion/README.md index 0da43474b..5016a12e3 100644 --- a/demo/Diffusion/README.md +++ b/demo/Diffusion/README.md @@ -7,7 +7,7 @@ This demo application ("demoDiffusion") showcases the acceleration of Stable Dif ### Clone the TensorRT OSS repository ```bash -git clone git@github.com:NVIDIA/TensorRT.git -b release/10.1 --single-branch +git clone git@github.com:NVIDIA/TensorRT.git -b release/10.2 --single-branch cd TensorRT ``` @@ -48,7 +48,7 @@ onnx 1.15.0 onnx-graphsurgeon 0.5.2 onnxruntime 1.16.3 polygraphy 0.49.9 -tensorrt 10.1.0.27 +tensorrt 10.2.0.19 tokenizers 0.13.3 torch 2.2.0 transformers 4.33.1 diff --git a/docker/rockylinux8.Dockerfile b/docker/rockylinux8.Dockerfile index 707c419b2..1ff359e2d 100644 --- a/docker/rockylinux8.Dockerfile +++ b/docker/rockylinux8.Dockerfile @@ -15,7 +15,7 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 FROM nvidia/cuda:${CUDA_VERSION}-devel-rockylinux8 LABEL maintainer="NVIDIA CORPORATION" @@ -25,7 +25,7 @@ ENV NV_CUDNN_VERSION 8.9.6.50-1 ENV NV_CUDNN_PACKAGE libcudnn8-${NV_CUDNN_VERSION}.cuda12.2 ENV NV_CUDNN_PACKAGE_DEV libcudnn8-devel-${NV_CUDNN_VERSION}.cuda12.2 -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 SHELL ["/bin/bash", "-c"] RUN dnf install -y \ @@ -62,15 +62,15 @@ RUN dnf install -y python38 python38-devel &&\ # Install TensorRT RUN if [ "${CUDA_VERSION:0:2}" = "11" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib64 \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp38-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib64 \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp38-none-linux_x86_64.whl ;\ elif [ "${CUDA_VERSION:0:2}" = "12" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib64 \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp38-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib64 \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp38-none-linux_x86_64.whl ;\ else \ echo "Invalid CUDA_VERSION"; \ exit 1; \ diff --git a/docker/rockylinux9.Dockerfile b/docker/rockylinux9.Dockerfile index c62aa2144..815b608e1 100644 --- a/docker/rockylinux9.Dockerfile +++ b/docker/rockylinux9.Dockerfile @@ -15,7 +15,7 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 FROM nvidia/cuda:${CUDA_VERSION}-devel-rockylinux9 LABEL maintainer="NVIDIA CORPORATION" @@ -25,7 +25,7 @@ ENV NV_CUDNN_VERSION 8.9.6.50-1 ENV NV_CUDNN_PACKAGE libcudnn8-${NV_CUDNN_VERSION}.cuda12.2 ENV NV_CUDNN_PACKAGE_DEV libcudnn8-devel-${NV_CUDNN_VERSION}.cuda12.2 -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 SHELL ["/bin/bash", "-c"] RUN dnf install -y \ @@ -67,15 +67,15 @@ RUN dnf -y install \ # Install TensorRT RUN if [ "${CUDA_VERSION:0:2}" = "11" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib64 \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp39-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib64 \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp39-none-linux_x86_64.whl ;\ elif [ "${CUDA_VERSION:0:2}" = "12" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib64 \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp39-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib64 \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp39-none-linux_x86_64.whl ;\ else \ echo "Invalid CUDA_VERSION"; \ exit 1; \ diff --git a/docker/ubuntu-20.04.Dockerfile b/docker/ubuntu-20.04.Dockerfile index da587d251..881139e3a 100644 --- a/docker/ubuntu-20.04.Dockerfile +++ b/docker/ubuntu-20.04.Dockerfile @@ -15,7 +15,7 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04 LABEL maintainer="NVIDIA CORPORATION" @@ -28,7 +28,7 @@ ENV CUDA_VERSION_MAJOR_MINOR=12.2 ENV NV_CUDNN_PACKAGE "libcudnn8=$NV_CUDNN_VERSION-1+cuda${CUDA_VERSION_MAJOR_MINOR}" ENV NV_CUDNN_PACKAGE_DEV "libcudnn8-dev=$NV_CUDNN_VERSION-1+cuda${CUDA_VERSION_MAJOR_MINOR}" -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 SHELL ["/bin/bash", "-c"] RUN apt-get update && apt-get install -y --no-install-recommends \ @@ -84,15 +84,15 @@ RUN apt-get install -y --no-install-recommends \ # Install TensorRT RUN if [ "${CUDA_VERSION:0:2}" = "11" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib/x86_64-linux-gnu \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp38-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib/x86_64-linux-gnu \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp38-none-linux_x86_64.whl ;\ elif [ "${CUDA_VERSION:0:2}" = "12" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib/x86_64-linux-gnu \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp38-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib/x86_64-linux-gnu \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp38-none-linux_x86_64.whl ;\ else \ echo "Invalid CUDA_VERSION"; \ exit 1; \ diff --git a/docker/ubuntu-22.04-aarch64.Dockerfile b/docker/ubuntu-22.04-aarch64.Dockerfile index cd09108de..783193d24 100644 --- a/docker/ubuntu-22.04-aarch64.Dockerfile +++ b/docker/ubuntu-22.04-aarch64.Dockerfile @@ -15,12 +15,12 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 # Multi-arch container support available in non-cudnn containers. FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 SHELL ["/bin/bash", "-c"] # Setup user account @@ -71,7 +71,7 @@ RUN apt-get install -y --no-install-recommends \ # Install TensorRT. This will also pull in CUDNN RUN ver="${CUDA_VERSION%.*}" &&\ if [ "${ver%.*}" = "12" ] ; then \ - ver="12.4"; \ + ver="12.5"; \ fi &&\ v="${TRT_VERSION}-1+cuda${ver}" &&\ apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/sbsa/3bf863cc.pub &&\ diff --git a/docker/ubuntu-22.04.Dockerfile b/docker/ubuntu-22.04.Dockerfile index 28686c6fa..924f3c02d 100644 --- a/docker/ubuntu-22.04.Dockerfile +++ b/docker/ubuntu-22.04.Dockerfile @@ -15,7 +15,7 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04 LABEL maintainer="NVIDIA CORPORATION" @@ -28,7 +28,7 @@ ENV CUDA_VERSION_MAJOR_MINOR=12.2 ENV NV_CUDNN_PACKAGE "libcudnn8=$NV_CUDNN_VERSION-1+cuda${CUDA_VERSION_MAJOR_MINOR}" ENV NV_CUDNN_PACKAGE_DEV "libcudnn8-dev=$NV_CUDNN_VERSION-1+cuda${CUDA_VERSION_MAJOR_MINOR}" -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 SHELL ["/bin/bash", "-c"] RUN apt-get update && apt-get install -y --no-install-recommends \ @@ -84,15 +84,15 @@ RUN apt-get install -y --no-install-recommends \ # Install TensorRT RUN if [ "${CUDA_VERSION:0:2}" = "11" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-11.8.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib/x86_64-linux-gnu \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp310-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib/x86_64-linux-gnu \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp310-none-linux_x86_64.whl ;\ elif [ "${CUDA_VERSION:0:2}" = "12" ]; then \ - wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.1.0/tars/TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && tar -xf TensorRT-10.1.0.27.Linux.x86_64-gnu.cuda-12.4.tar.gz \ - && cp -a TensorRT-10.1.0.27/lib/*.so* /usr/lib/x86_64-linux-gnu \ - && pip install TensorRT-10.1.0.27/python/tensorrt-10.1.0-cp310-none-linux_x86_64.whl ;\ + wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && tar -xf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz \ + && cp -a TensorRT-10.2.0.19/lib/*.so* /usr/lib/x86_64-linux-gnu \ + && pip install TensorRT-10.2.0.19/python/tensorrt-10.2.0-cp310-none-linux_x86_64.whl ;\ else \ echo "Invalid CUDA_VERSION"; \ exit 1; \ diff --git a/docker/ubuntu-cross-aarch64.Dockerfile b/docker/ubuntu-cross-aarch64.Dockerfile index ae07ab922..253dd0421 100644 --- a/docker/ubuntu-cross-aarch64.Dockerfile +++ b/docker/ubuntu-cross-aarch64.Dockerfile @@ -15,13 +15,13 @@ # limitations under the License. # -ARG CUDA_VERSION=12.4.0 +ARG CUDA_VERSION=12.5.0 ARG OS_VERSION=22.04 FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${OS_VERSION} LABEL maintainer="NVIDIA CORPORATION" -ENV TRT_VERSION 10.1.0.27 +ENV TRT_VERSION 10.2.0.19 ENV DEBIAN_FRONTEND=noninteractive ARG uid=1000 diff --git a/include/NvInfer.h b/include/NvInfer.h index a04209739..11c6994f3 100644 --- a/include/NvInfer.h +++ b/include/NvInfer.h @@ -5620,17 +5620,18 @@ constexpr inline int32_t EnumMax() noexcept //! Output, and a scatter mode. When kELEMENT mode is used an optional axis parameter is available. //! * Data is a tensor of rank r >= 1 that stores the values to be duplicated in Output. //! * Indices is a tensor of rank q that determines which locations in Output to write new -//! values to. Constraints on the rank of q depend on the mode: +//! values to. Constraints on the rank q depend on the mode: //! ScatterMode::kND: q >= 1 //! ScatterMode::kELEMENT: q must be the same as r -//! * Updates is atensor of rank s >=1 that provides the data -//! to write to Output specified by its corresponding location in Index. Constraints the rank of Updates depend on the -//! mode: +//! * Updates is a tensor of rank s >= 1 that provides the data +//! to write to Output specified by its corresponding location in Indices. +//! Constraints on the rank of Updates depend on the mode: //! ScatterMode::kND: s = r + q - shape(Indices)[-1] - 1 //! Scattermode::kELEMENT: s = q = r //! * Output is a tensor with the same dimensions as Data that stores the resulting values of the //! transformation. It must not be a shape tensor. -//! The types of Data, Update, and Output shall be the same, and Indices shall be DataType::kINT32 or DataType::kINT64. +//! The types of Data, Update, and Output shall be the same, and Indices shall be of type DataType::kINT32 or +//! DataType::kINT64. //! //! The output is computed by copying the data, and then updating elements of it based on indices. //! How Indices are interpreted depends upon the ScatterMode. @@ -5643,7 +5644,7 @@ constexpr inline int32_t EnumMax() noexcept //! Given that data dims are {d_0,...,d_{r-1}} and indices dims are {i_0,...,i_{q-1}}, //! define k = indices[q-1], it follows that updates dims are {i_0,...,i_{q-2},d_k,...,d_{r-1}} //! The updating can be computed by: -//! foreach slice in indices[i_0,...i_{q-2}] +//! foreach slice in indices[i_0,...,i_{q-2}] //! output[indices[slice]] = updates[slice] //! //! ScatterMode::kELEMENT @@ -7539,6 +7540,44 @@ class INetworkDefinition : public INoCopy return mImpl->getBuilder(); } + //! + //! \brief Mark weights as refittable when the builder flag kREFIT_INDIVIDUAL is set. + //! + //! \param name The name of the weights. + //! + //! \return True if the weights were successfully marked as refittable, false if the weights do not exist or cannot + //! be refitted. + //! + bool markWeightsRefittable(char const* name) noexcept + { + return mImpl->markWeightsRefittable(name); + } + + //! + //! \brief Unmark weights as refittable when the builder flag kREFIT_INDIVIDUAL is set. + //! + //! \param name The name of the weights. + //! + //! \return True if the weights were successfully marked as unrefittable, false if the weights do not exist. + //! + bool unmarkWeightsRefittable(char const* name) noexcept + { + return mImpl->unmarkWeightsRefittable(name); + } + + //! + //! \brief Whether the weight has been marked as refittable. + //! + //! \param name The name of the weights to check. + //! + //! \return True if the weights are marked as refittable, false if the weights do not exist or are marked as + //! non-refittable. + //! + bool areWeightsMarkedRefittable(char const* name) const noexcept + { + return mImpl->areWeightsMarkedRefittable(name); + } + protected: apiv::VNetworkDefinition* mImpl; }; @@ -8145,6 +8184,48 @@ constexpr inline int32_t EnumMax() noexcept return 1; } +//! +//! \enum RuntimePlatform +//! +//! \brief Describes the intended runtime platform (operating system and CPU architecture) for the execution of the +//! TensorRT engine. TensorRT provides support for cross-platform engine compatibility when the target runtime +//! platform is different from the build platform. +//! +//! \note The cross-platform engine will not be able to run on the host platform it was built on. +//! +//! \note When building a cross-platform engine that also requires version forward compatibility, +//! kEXCLUDE_LEAN_RUNTIME must be set to exclude the target platform lean runtime. +//! +//! \note The cross-platform engine might have performance differences compared to the natively built engine on the +//! target platform. +//! +//! \see IBuilderConfig::setRuntimePlatform(), IBuilderConfig::getRuntimePlatform() +//! +enum class RuntimePlatform : int32_t +{ + //! No requirement for cross-platform compatibility. The engine constructed by TensorRT can only run on the + //! identical platform it was built on. + kSAME_AS_BUILD = 0, + + //! Designates the target platform for engine execution as Windows AMD64 system. Currently this flag can only be + //! enabled when building engines on Linux AMD64 platforms. + kWINDOWS_AMD64 = 1, +}; + +namespace impl +{ +//! +//! Maximum number of elements in RuntimePlatform enum. +//! +//! \see RuntimePlatform +//! +template <> +struct EnumMaxImpl +{ + static constexpr int32_t kVALUE = 2; +}; +} // namespace impl + //! //! \brief Represents one or more BuilderFlag values using binary OR //! operations, e.g., 1U << BuilderFlag::kFP16 | 1U << BuilderFlag::kDEBUG. @@ -8289,6 +8370,12 @@ enum class BuilderFlag : int32_t //! Enable plugins with INT4 input/output. kINT4 = 22, + //! Enable building a refittable engine and provide fine-grained control. This allows + //! control over which weights are refittable or not using INetworkDefinition::markWeightsRefittable and + //! INetworkDefinition::unmarkWeightsRefittable. By default, all weights are non-refittable when this flag is + //! enabled. This flag cannot be used together with kREFIT or kREFIT_IDENTICAL. + kREFIT_INDIVIDUAL = 23, + }; //! @@ -8299,7 +8386,7 @@ enum class BuilderFlag : int32_t template <> constexpr inline int32_t EnumMax() noexcept { - return 23; + return 24; } //! @@ -9411,6 +9498,34 @@ class IBuilderConfig : public INoCopy return mImpl->getProgressMonitor(); } + //! + //! \brief Set the target platform for runtime execution. + //! + //! Cross-platform compatibility allows an engine to be built and executed on different platforms. + //! + //! The default cross-platform target is RuntimePlatform::kSAME_AS_BUILD. + //! + //! \param runtimePlatform The target platform for runtime execution. + //! + //! \see IBuilderConfig::getRuntimePlatform() + //! + void setRuntimePlatform(RuntimePlatform runtimePlatform) noexcept + { + mImpl->setRuntimePlatform(runtimePlatform); + } + + //! + //! \brief Get the target platform for runtime execution. + //! + //! \return The target platform for runtime execution. + //! + //! \see IBuilderConfig::setRuntimePlatform() + //! + RuntimePlatform getRuntimePlatform() const noexcept + { + return mImpl->getRuntimePlatform(); + } + protected: apiv::VBuilderConfig* mImpl; }; diff --git a/include/NvInferImpl.h b/include/NvInferImpl.h index bb66ecd6d..b77cb1823 100644 --- a/include/NvInferImpl.h +++ b/include/NvInferImpl.h @@ -178,6 +178,7 @@ enum class WeightsRole : int32_t; enum class PreviewFeature : int32_t; enum class HardwareCompatibilityLevel : int32_t; enum class ExecutionContextAllocationStrategy : int32_t; +enum class RuntimePlatform : int32_t; using TacticSources = uint32_t; using TensorFormats = uint32_t; @@ -1058,6 +1059,9 @@ class VNetworkDefinition : public VRoot virtual bool markDebug(ITensor& tensor) noexcept = 0; virtual bool unmarkDebug(ITensor& tensor) noexcept = 0; virtual bool isDebugTensor(nvinfer1::ITensor const& tensor) const noexcept = 0; + virtual bool markWeightsRefittable(char const* name) noexcept = 0; + virtual bool unmarkWeightsRefittable(char const* name) noexcept = 0; + virtual bool areWeightsMarkedRefittable(char const* name) const noexcept = 0; }; class VAlgorithmIOInfo : public VRoot @@ -1161,6 +1165,8 @@ class VBuilderConfig : public VRoot virtual int32_t getMaxAuxStreams() const noexcept = 0; virtual void setProgressMonitor(IProgressMonitor* monitor) noexcept = 0; virtual IProgressMonitor* getProgressMonitor() const noexcept = 0; + virtual void setRuntimePlatform(RuntimePlatform runtimePlatform) noexcept = 0; + virtual RuntimePlatform getRuntimePlatform() const noexcept = 0; }; class VSerializationConfig : public VRoot diff --git a/include/NvInferRuntime.h b/include/NvInferRuntime.h index 81cb7ba13..93959c372 100644 --- a/include/NvInferRuntime.h +++ b/include/NvInferRuntime.h @@ -2160,8 +2160,7 @@ class IOptimizationProfile : public INoCopy //! i = 0, ..., nbValues - 1. Execution of the network must be valid for the optVals. //! //! Shape tensors are tensors that contribute to shape calculations in some way. While input shape tensors can be - //! type kBOOL, kINT32, or kINT64, the values used to set the minimum, optimium, and maximum values must fit in int32_t. - //! Boolean values are represented as 0 for false and 1 for true. + //! type kINT32 or kINT64, the values used to set the minimum, optimium, and maximum values must fit in int32_t. //! //! Examples: //! @@ -3123,8 +3122,8 @@ class ICudaEngine : public INoCopy //! \return An IHostMemory object that contains the serialized engine. //! //! The network may be deserialized with IRuntime::deserializeCudaEngine(). - //! Serializing plan file with SerializationFlag::kEXCLUDE_WEIGHTS requires building the engine with kREFIT or - //! kREFIT_IDENTICAL. + //! Serializing plan file with SerializationFlag::kEXCLUDE_WEIGHTS requires building the engine with kREFIT, + //! kREFIT_IDENTICAL or kREFIT_INDIVIDUAL. //! //! \see IRuntime::deserializeCudaEngine() //! @@ -3298,7 +3297,12 @@ class ICudaEngine : public INoCopy } //! - //! \brief TensorRT automatically determines an ideal budget for the model to run. + //! \brief TensorRT automatically determines a device memory budget for the model to run. The budget is close to the + //! current free memory size, leaving some space for other memory needs in the user's application. If the budget + //! exceeds the size obtained from getStreamableWeightsSize(), it is capped to that size, effectively disabling + //! weight streaming. Since TensorRT lacks information about the user's allocations, the remaining memory size might + //! be larger than required, leading to wasted memory, or smaller than required, causing an out-of-memory error. For + //! optimal memory allocation, it is recommended to manually calculate and set the budget. //! //! \warning BuilderFlag::kWEIGHT_STREAMING must be set during engine building. //! diff --git a/include/NvInferVersion.h b/include/NvInferVersion.h index 3a33d4930..378acb4c2 100644 --- a/include/NvInferVersion.h +++ b/include/NvInferVersion.h @@ -24,9 +24,9 @@ #define NV_INFER_VERSION_H #define NV_TENSORRT_MAJOR 10 //!< TensorRT major version. -#define NV_TENSORRT_MINOR 1 //!< TensorRT minor version. +#define NV_TENSORRT_MINOR 2 //!< TensorRT minor version. #define NV_TENSORRT_PATCH 0 //!< TensorRT patch version. -#define NV_TENSORRT_BUILD 27 //!< TensorRT build number. +#define NV_TENSORRT_BUILD 19 //!< TensorRT build number. #define NV_TENSORRT_LWS_MAJOR 0 //!< TensorRT LWS major version. #define NV_TENSORRT_LWS_MINOR 0 //!< TensorRT LWS minor version. diff --git a/parsers/onnx b/parsers/onnx index 96e781103..706f02e74 160000 --- a/parsers/onnx +++ b/parsers/onnx @@ -1 +1 @@ -Subproject commit 96e781103cfc4adf4a6bb557e94bac8e693f6f4c +Subproject commit 706f02e74366b2cbaacf87be61de95df051a2788 diff --git a/plugin/api/inferPlugin.cpp b/plugin/api/inferPlugin.cpp index 1aea8cee8..ad1ad62a2 100644 --- a/plugin/api/inferPlugin.cpp +++ b/plugin/api/inferPlugin.cpp @@ -32,6 +32,7 @@ #include "generateDetectionPlugin/generateDetectionPlugin.h" #include "gridAnchorPlugin/gridAnchorPlugin.h" #include "instanceNormalizationPlugin/instanceNormalizationPlugin.h" +#include "instanceNormalizationPlugin/instanceNormalizationPluginLegacy.h" #include "leakyReluPlugin/lReluPlugin.h" #include "modulatedDeformConvPlugin/modulatedDeformConvPlugin.h" #include "multilevelCropAndResizePlugin/multilevelCropAndResizePlugin.h" @@ -198,6 +199,7 @@ extern "C" initializePlugin(logger, libNamespace); initializePlugin(logger, libNamespace); initializePlugin(logger, libNamespace); + initializePlugin(logger, libNamespace); initializePlugin(logger, libNamespace); initializePlugin(logger, libNamespace); initializePlugin(logger, libNamespace); diff --git a/plugin/bertQKVToContextPlugin/CustomQKVToContextPluginDynamic_PluginConfig.yaml b/plugin/bertQKVToContextPlugin/CustomQKVToContextPluginDynamic_PluginConfig.yaml index 6b56ebea1..781d8b106 100644 --- a/plugin/bertQKVToContextPlugin/CustomQKVToContextPluginDynamic_PluginConfig.yaml +++ b/plugin/bertQKVToContextPlugin/CustomQKVToContextPluginDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CustomQKVToContextPluginDynamic interface: "IPluginV2DynamicExt" diff --git a/plugin/clipPlugin/ClipPlugin_PluginConfig.yaml b/plugin/clipPlugin/ClipPlugin_PluginConfig.yaml index c83100af6..fcfa8a381 100644 --- a/plugin/clipPlugin/ClipPlugin_PluginConfig.yaml +++ b/plugin/clipPlugin/ClipPlugin_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: Clip_TRT interface: "IPluginV2" diff --git a/plugin/common/common.cuh b/plugin/common/common.cuh index 58d6e31c0..4fff6d490 100644 --- a/plugin/common/common.cuh +++ b/plugin/common/common.cuh @@ -18,6 +18,24 @@ #ifndef COMMON_CUH #define COMMON_CUH +// TODO: Remove WAR once issue resolved in CUB (CUDA 12.6+?) + +#ifndef CUDA_VERSION +#include +#endif // CUDA_VERSION + +#if CUDA_VERSION >= 12050 +#include +#undef _CCCL_FORCEINLINE + +#if defined(_CCCL_CUDA_COMPILER) +# define _CCCL_FORCEINLINE __forceinline__ +#else // ^^^ _CCCL_CUDA_COMPILER ^^^ / vvv !_CCCL_CUDA_COMPILER vvv +# define _CCCL_FORCEINLINE inline +#endif // !_CCCL_CUDA_COMPILER + +#endif // CUDA_VERSION >= 12050 + #include "common/cublasWrapper.h" #include #include diff --git a/plugin/common/cub_helper.h b/plugin/common/cub_helper.h index 7cc358486..7c947b2fa 100644 --- a/plugin/common/cub_helper.h +++ b/plugin/common/cub_helper.h @@ -14,6 +14,25 @@ * See the License for the specific language governing permissions and * limitations under the License. */ + +// TODO: Remove WAR once issue resolved in CUB (CUDA 12.6+?) + +#ifndef CUDA_VERSION +#include +#endif // CUDA_VERSION + +#if CUDA_VERSION >= 12050 +#include +#undef _CCCL_FORCEINLINE + +#if defined(_CCCL_CUDA_COMPILER) +# define _CCCL_FORCEINLINE __forceinline__ +#else // ^^^ _CCCL_CUDA_COMPILER ^^^ / vvv !_CCCL_CUDA_COMPILER vvv +# define _CCCL_FORCEINLINE inline +#endif // !_CCCL_CUDA_COMPILER + +#endif // CUDA_VERSION >= 12050 + #include "common/kernels/kernel.h" #include template diff --git a/plugin/cropAndResizePlugin/CropAndResizeDynamic_PluginConfig.yaml b/plugin/cropAndResizePlugin/CropAndResizeDynamic_PluginConfig.yaml index bf0747530..46f7b209b 100644 --- a/plugin/cropAndResizePlugin/CropAndResizeDynamic_PluginConfig.yaml +++ b/plugin/cropAndResizePlugin/CropAndResizeDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CropAndResizeDynamic interface: "IPluginV2DynamicExt" diff --git a/plugin/cropAndResizePlugin/CropAndResizePlugin_PluginConfig.yaml b/plugin/cropAndResizePlugin/CropAndResizePlugin_PluginConfig.yaml index dee29885a..bc2f30502 100644 --- a/plugin/cropAndResizePlugin/CropAndResizePlugin_PluginConfig.yaml +++ b/plugin/cropAndResizePlugin/CropAndResizePlugin_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CropAndResize interface: "IPluginV2Ext" diff --git a/plugin/detectionLayerPlugin/DetectionLayer_PluginConfig.yaml b/plugin/detectionLayerPlugin/DetectionLayer_PluginConfig.yaml index 075fdd3a4..119835096 100644 --- a/plugin/detectionLayerPlugin/DetectionLayer_PluginConfig.yaml +++ b/plugin/detectionLayerPlugin/DetectionLayer_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: DetectionLayer_TRT interface: "IPluginV2Ext" diff --git a/plugin/efficientNMSPlugin/EfficientNMSPlugin_PluginConfig.yaml b/plugin/efficientNMSPlugin/EfficientNMSPlugin_PluginConfig.yaml index 0a7ce4146..1e4a1bbc1 100644 --- a/plugin/efficientNMSPlugin/EfficientNMSPlugin_PluginConfig.yaml +++ b/plugin/efficientNMSPlugin/EfficientNMSPlugin_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: EfficientNMS_TRT interface: "IPluginV2DynamicExt" diff --git a/plugin/embLayerNormPlugin/CustomEmbLayerNormPluginDynamic_PluginConfig.yaml b/plugin/embLayerNormPlugin/CustomEmbLayerNormPluginDynamic_PluginConfig.yaml index d5f1594ad..a942508c8 100644 --- a/plugin/embLayerNormPlugin/CustomEmbLayerNormPluginDynamic_PluginConfig.yaml +++ b/plugin/embLayerNormPlugin/CustomEmbLayerNormPluginDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CustomEmbLayerNormPluginDynamic interface: "IPluginV2DynamicExt" diff --git a/plugin/fcPlugin/CustomFCPluginDynamic_PluginConfig.yaml b/plugin/fcPlugin/CustomFCPluginDynamic_PluginConfig.yaml index 0939c4344..c29c1f848 100644 --- a/plugin/fcPlugin/CustomFCPluginDynamic_PluginConfig.yaml +++ b/plugin/fcPlugin/CustomFCPluginDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CustomFCPluginDynamic interface: IPluginV2DynamicExt @@ -73,4 +89,4 @@ versions: shape: "1" output_types: output: float16 -... \ No newline at end of file +... diff --git a/plugin/flattenConcat/FlattenConcat_PluginConfig.yaml b/plugin/flattenConcat/FlattenConcat_PluginConfig.yaml index bb2f318fc..b52bccf79 100644 --- a/plugin/flattenConcat/FlattenConcat_PluginConfig.yaml +++ b/plugin/flattenConcat/FlattenConcat_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: FlattenConcat_TRT interface: "IPluginV2Ext" diff --git a/plugin/geluPlugin/CustomGeluPluginDynamic_PluginConfig.yaml b/plugin/geluPlugin/CustomGeluPluginDynamic_PluginConfig.yaml index 3a41b9eda..55060f96e 100644 --- a/plugin/geluPlugin/CustomGeluPluginDynamic_PluginConfig.yaml +++ b/plugin/geluPlugin/CustomGeluPluginDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CustomGeluPluginDynamic interface: "IPluginV2DynamicExt" diff --git a/plugin/generateDetectionPlugin/GenerateDetection_PluginConfig.yaml b/plugin/generateDetectionPlugin/GenerateDetection_PluginConfig.yaml index be62cd373..98474d4ca 100644 --- a/plugin/generateDetectionPlugin/GenerateDetection_PluginConfig.yaml +++ b/plugin/generateDetectionPlugin/GenerateDetection_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: GenerateDetection_TRT interface: "IPluginV2Ext" diff --git a/plugin/gridAnchorPlugin/GridAnchor_TRT_PluginConfig.yaml b/plugin/gridAnchorPlugin/GridAnchor_TRT_PluginConfig.yaml index d95ddf5c8..331916b56 100644 --- a/plugin/gridAnchorPlugin/GridAnchor_TRT_PluginConfig.yaml +++ b/plugin/gridAnchorPlugin/GridAnchor_TRT_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: GridAnchor_TRT interface: "IPluginV2Ext" diff --git a/plugin/groupNormalizationPlugin/GroupNormalizationPlugin_PluginConfig.yaml b/plugin/groupNormalizationPlugin/GroupNormalizationPlugin_PluginConfig.yaml index bec760a58..d86b3e00a 100644 --- a/plugin/groupNormalizationPlugin/GroupNormalizationPlugin_PluginConfig.yaml +++ b/plugin/groupNormalizationPlugin/GroupNormalizationPlugin_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: GroupNormalizationPlugin interface: "IPluginV2DynamicExt" diff --git a/plugin/instanceNormalizationPlugin/instanceNormCommon.h b/plugin/instanceNormalizationPlugin/instanceNormCommon.h index 938ed2cfd..0374eba78 100644 --- a/plugin/instanceNormalizationPlugin/instanceNormCommon.h +++ b/plugin/instanceNormalizationPlugin/instanceNormCommon.h @@ -18,7 +18,9 @@ #ifndef INSTANCE_NORM_COMMON_H #define INSTANCE_NORM_COMMON_H +#include "common/plugin.h" #include +using namespace nvinfer1::pluginInternal; #define DEVICE_FUNCTION static inline __device__ @@ -761,4 +763,41 @@ struct ParallelSums<8, 4> } }; +namespace +{ +int32_t divUp(int32_t m, int32_t n) +{ + PLUGIN_ASSERT(m >= 0); + PLUGIN_ASSERT(n > 0); + // Use unsigned arithmetic to preclude overflow. + auto const mu = static_cast(m); + auto const nu = static_cast(n); + return (mu + nu - 1U) / nu; +} + +cudnnStatus_t convertTrt2cudnnDtype(nvinfer1::DataType trt_dtype, cudnnDataType_t* cudnn_dtype) +{ + switch (trt_dtype) + { + case nvinfer1::DataType::kFLOAT: *cudnn_dtype = CUDNN_DATA_FLOAT; break; + case nvinfer1::DataType::kHALF: *cudnn_dtype = CUDNN_DATA_HALF; break; + default: return CUDNN_STATUS_BAD_PARAM; + } + return CUDNN_STATUS_SUCCESS; +} + +} // namespace +template +__global__ __launch_bounds__(THREADS_PER_CTA) void in3dReluActivation(T* dst, T const* src, T alpha, int32_t count) +{ + int32_t idx = blockIdx.x * THREADS_PER_CTA + threadIdx.x; + if (idx >= count) + { + return; + } + + T val = src[idx]; + dst[idx] = (val < static_cast(0.F)) ? val * alpha : val; +} + #endif // INSTANCE_NORM_COMMON_H diff --git a/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu b/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu index 007715ad6..abbeb5337 100644 --- a/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu +++ b/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cu @@ -16,6 +16,7 @@ */ #include "common/checkMacrosPlugin.h" #include "instanceNormalizationPlugin.h" +#include "instanceNormCommon.h" #include #include #include @@ -24,58 +25,19 @@ using namespace nvinfer1; using namespace nvinfer1::plugin; using namespace nvinfer1::pluginInternal; using namespace instance_norm_impl; -using nvinfer1::plugin::InstanceNormalizationPlugin; -using nvinfer1::plugin::InstanceNormalizationPluginV2; -using nvinfer1::plugin::InstanceNormalizationPluginCreator; -using nvinfer1::plugin::InstanceNormalizationPluginCreatorV2; -namespace -{ -int32_t divUp(int32_t m, int32_t n) -{ - PLUGIN_ASSERT(m >= 0); - PLUGIN_ASSERT(n > 0); - // Use unsigned arithmetic to preclude overflow. - auto const mu = static_cast(m); - auto const nu = static_cast(n); - return (mu + nu - 1U) / nu; -} -} // namespace -template -__global__ __launch_bounds__(THREADS_PER_CTA) void in3dReluActivation( - T* dst, T const* src, float alpha, int32_t count) -{ - int32_t idx = blockIdx.x * THREADS_PER_CTA + threadIdx.x; - if (idx >= count) - { - return; - } - - float val = src[idx]; - dst[idx] = (val < 0.F) ? val * alpha : val; -} - -cudnnStatus_t convertTrt2cudnnDtype(nvinfer1::DataType trt_dtype, cudnnDataType_t* cudnn_dtype) -{ - switch (trt_dtype) - { - case nvinfer1::DataType::kFLOAT: *cudnn_dtype = CUDNN_DATA_FLOAT; break; - case nvinfer1::DataType::kHALF: *cudnn_dtype = CUDNN_DATA_HALF; break; - default: return CUDNN_STATUS_BAD_PARAM; - } - return CUDNN_STATUS_SUCCESS; -} +using nvinfer1::plugin::InstanceNormalizationV3Plugin; +using nvinfer1::plugin::InstanceNormalizationV3PluginCreator; namespace { -constexpr char const* INSTANCE_PLUGIN_VERSION{"1"}; -constexpr char const* INSTANCE_PLUGIN_VERSION_V2{"2"}; -constexpr char const* INSTANCE_PLUGIN_NAME{"InstanceNormalization_TRT"}; +constexpr char const* gInstancePluginVersion{"3"}; +constexpr char const* gInstancePluginName{"InstanceNormalization_TRT"}; } // namespace -PluginFieldCollection InstanceNormalizationPluginCreator::mFC{}; -std::vector InstanceNormalizationPluginCreator::mPluginAttributes; +PluginFieldCollection InstanceNormalizationV3PluginCreator::mFC{}; +std::vector InstanceNormalizationV3PluginCreator::mPluginAttributes; -InstanceNormalizationPlugin::InstanceNormalizationPlugin( +InstanceNormalizationV3Plugin::InstanceNormalizationV3Plugin( float epsilon, std::vector const& scale, std::vector const& bias, int32_t relu, float alpha) : mEpsilon(epsilon) , mAlpha(alpha) @@ -87,7 +49,7 @@ InstanceNormalizationPlugin::InstanceNormalizationPlugin( PLUGIN_VALIDATE(scale.size() == bias.size()); } -InstanceNormalizationPlugin::InstanceNormalizationPlugin( +InstanceNormalizationV3Plugin::InstanceNormalizationV3Plugin( float epsilon, nvinfer1::Weights const& scale, nvinfer1::Weights const& bias, int32_t relu, float alpha) : mEpsilon(epsilon) , mAlpha(alpha) @@ -113,7 +75,7 @@ InstanceNormalizationPlugin::InstanceNormalizationPlugin( } else { - throw std::runtime_error("Unsupported scale/bias dtype"); + PLUGIN_ERROR("Unsupported scale/bias dtype"); } }; @@ -121,35 +83,40 @@ InstanceNormalizationPlugin::InstanceNormalizationPlugin( copyWeights(bias, mHostBias); } -InstanceNormalizationPlugin::InstanceNormalizationPlugin(void const* serialData, size_t serialLength) +InstanceNormalizationV3Plugin::~InstanceNormalizationV3Plugin() { - deserialize_value(&serialData, &serialLength, &mEpsilon); - deserialize_value(&serialData, &serialLength, &mNchan); - deserialize_value(&serialData, &serialLength, &mHostScale); - deserialize_value(&serialData, &serialLength, &mHostBias); - deserialize_value(&serialData, &serialLength, &mRelu); - deserialize_value(&serialData, &serialLength, &mAlpha); + exitContext(); } -InstanceNormalizationPlugin::~InstanceNormalizationPlugin() -{ - terminate(); -} - -// InstanceNormalizationPlugin returns one output. -int32_t InstanceNormalizationPlugin::getNbOutputs() const noexcept +// InstanceNormalizationV3Plugin returns one output. +int32_t InstanceNormalizationV3Plugin::getNbOutputs() const noexcept { return 1; } -DimsExprs InstanceNormalizationPlugin::getOutputDimensions(int32_t outputIndex, nvinfer1::DimsExprs const* inputs, - int32_t nbInputs, nvinfer1::IExprBuilder& exprBuilder) noexcept +IPluginCapability* InstanceNormalizationV3Plugin::getCapabilityInterface(PluginCapabilityType type) noexcept { - nvinfer1::DimsExprs output(inputs[0]); - return output; + try + { + if (type == PluginCapabilityType::kBUILD) + { + return static_cast(this); + } + if (type == PluginCapabilityType::kRUNTIME) + { + return static_cast(this); + } + PLUGIN_ASSERT(type == PluginCapabilityType::kCORE); + return static_cast(this); + } + catch (std::exception const& e) + { + caughtError(e); + } + return nullptr; } -int32_t InstanceNormalizationPlugin::initialize() noexcept +int32_t InstanceNormalizationV3Plugin::initializeContext() { if (!mInitialized) { @@ -162,27 +129,29 @@ int32_t InstanceNormalizationPlugin::initialize() noexcept // NDHWC path // Device info. int32_t device; - PLUGIN_CHECK_CUDA(cudaGetDevice(&device)); + PLUGIN_CUASSERT(cudaGetDevice(&device)); cudaDeviceProp props; - PLUGIN_CHECK_CUDA(cudaGetDeviceProperties(&props, device)); + PLUGIN_CUASSERT(cudaGetDeviceProperties(&props, device)); mContext.sm_count = props.multiProcessorCount; mContext.sm_shared_size = props.sharedMemPerMultiprocessor; mContext.sm_version = props.major * 100 + props.minor * 10; - PLUGIN_CHECK_CUDA(cudaMalloc(&mDeviceScale, mNchan * sizeof(float))); - PLUGIN_CHECK_CUDA(cudaMalloc(&mDeviceBias, mNchan * sizeof(float))); - PLUGIN_CHECK_CUDA(cudaMemcpy(mDeviceScale, &mHostScale[0], mNchan * sizeof(float), cudaMemcpyHostToDevice)); - PLUGIN_CHECK_CUDA(cudaMemcpy(mDeviceBias, &mHostBias[0], mNchan * sizeof(float), cudaMemcpyHostToDevice)); - - PLUGIN_CHECK_CUDA(cudaDriverGetVersion(&mCudaDriverVersion)); + PLUGIN_CUASSERT(cudaMalloc(&mDeviceScale, mNchan * sizeof(float))); + PLUGIN_ASSERT(mDeviceScale != nullptr); + PLUGIN_CUASSERT(cudaMalloc(&mDeviceBias, mNchan * sizeof(float))); + PLUGIN_ASSERT(mDeviceBias != nullptr); + PLUGIN_CUASSERT(cudaMemcpy(mDeviceScale, &mHostScale[0], mNchan * sizeof(float), cudaMemcpyHostToDevice)); + PLUGIN_CUASSERT(cudaMemcpy(mDeviceBias, &mHostBias[0], mNchan * sizeof(float), cudaMemcpyHostToDevice)); + + PLUGIN_CUASSERT(cudaDriverGetVersion(&mCudaDriverVersion)); } mInitialized = true; return 0; } -void InstanceNormalizationPlugin::terminate() noexcept +void InstanceNormalizationV3Plugin::exitContext() { if (mInitialized) { @@ -198,15 +167,15 @@ void InstanceNormalizationPlugin::terminate() noexcept mInitialized = false; } -size_t InstanceNormalizationPlugin::getWorkspaceSize(nvinfer1::PluginTensorDesc const* inputs, int32_t nbInputs, - nvinfer1::PluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept +size_t InstanceNormalizationV3Plugin::getWorkspaceSize(DynamicPluginTensorDesc const* inputs, int32_t nbInputs, + DynamicPluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept { - nvinfer1::Dims input_dims = inputs[0].dims; + nvinfer1::Dims input_dims = inputs[0].desc.dims; PLUGIN_ASSERT(input_dims.nbDims == 4 || input_dims.nbDims == 5); - if (inputs[0].format == nvinfer1::PluginFormat::kLINEAR) + if (inputs[0].desc.format == nvinfer1::PluginFormat::kLINEAR) { - nvinfer1::Dims input_dims = inputs[0].dims; + nvinfer1::Dims input_dims = inputs[0].desc.dims; int32_t n = input_dims.d[0]; int32_t c = input_dims.d[1]; @@ -219,12 +188,12 @@ size_t InstanceNormalizationPlugin::getWorkspaceSize(nvinfer1::PluginTensorDesc return total_wss; } - else if (inputs[0].format == nvinfer1::PluginFormat::kDHWC8 || inputs[0].format == nvinfer1::PluginFormat::kCDHW32) + else if (inputs[0].desc.format == nvinfer1::PluginFormat::kDHWC8 || inputs[0].desc.format == nvinfer1::PluginFormat::kCDHW32) { PLUGIN_ASSERT(input_dims.nbDims == 5); - int32_t input_data_type = (inputs[0].type == nvinfer1::DataType::kHALF) ? 1 : 2; - int32_t output_data_type = (outputs[0].type == nvinfer1::DataType::kHALF) ? 1 : 2; - nvinfer1::Dims input_dims = inputs[0].dims; + int32_t input_data_type = (inputs[0].desc.type == nvinfer1::DataType::kHALF) ? 1 : 2; + int32_t output_data_type = (outputs[0].desc.type == nvinfer1::DataType::kHALF) ? 1 : 2; + nvinfer1::Dims input_dims = inputs[0].desc.dims; int32_t n = input_dims.d[0]; int32_t c = input_dims.d[1]; @@ -232,7 +201,7 @@ size_t InstanceNormalizationPlugin::getWorkspaceSize(nvinfer1::PluginTensorDesc int32_t h = input_dims.d[3]; int32_t w = input_dims.d[4]; - InstanceNormFwdParams params; + InstanceNormFwdParams params{}; // only these parameters are required for workspace computation params.nhw = d * h * w; params.c = c; @@ -252,8 +221,8 @@ size_t InstanceNormalizationPlugin::getWorkspaceSize(nvinfer1::PluginTensorDesc return 0; } -int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* inputDesc, - nvinfer1::PluginTensorDesc const* outputDesc, void const* const* inputs, void* const* outputs, void* workspace, +int32_t InstanceNormalizationV3Plugin::enqueue(PluginTensorDesc const* inputDesc, + PluginTensorDesc const* outputDesc, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept { PLUGIN_VALIDATE(inputDesc != nullptr && outputDesc != nullptr && inputs != nullptr && outputs != nullptr && workspace != nullptr); @@ -323,7 +292,7 @@ int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* i cudnnBatchNormMode_t cudnnBatchNormMode = CUDNN_BATCHNORM_SPATIAL_PERSISTENT; cudaStreamCaptureStatus streamStatus; - PLUGIN_CHECK_CUDA(cudaStreamIsCapturing(stream, &streamStatus)); + PLUGIN_CUASSERT(cudaStreamIsCapturing(stream, &streamStatus)); if (streamStatus != cudaStreamCaptureStatusNone && mCudaDriverVersion < 11000) { @@ -359,9 +328,9 @@ int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* i float* d_bias = &_d_array[n * c]; for (int32_t i = 0; i < n; ++i) { - PLUGIN_CHECK_CUDA( + PLUGIN_CUASSERT( cudaMemcpyAsync(d_scale + i * c, mDeviceScale, nchan_bytes, cudaMemcpyDeviceToDevice, stream)); - PLUGIN_CHECK_CUDA( + PLUGIN_CUASSERT( cudaMemcpyAsync(d_bias + i * c, mDeviceBias, nchan_bytes, cudaMemcpyDeviceToDevice, stream)); } @@ -390,7 +359,7 @@ int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* i cudnnBatchNormMode_t cudnnBatchNormMode = CUDNN_BATCHNORM_SPATIAL_PERSISTENT; cudaStreamCaptureStatus streamStatus; - PLUGIN_CHECK_CUDA(cudaStreamIsCapturing(stream, &streamStatus)); + PLUGIN_CUASSERT(cudaStreamIsCapturing(stream, &streamStatus)); if (streamStatus != cudaStreamCaptureStatusNone && mCudaDriverVersion < 11000) { @@ -420,7 +389,7 @@ int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* i int32_t h = input_dims.d[3]; int32_t w = input_dims.d[4]; - InstanceNormFwdParams params; + InstanceNormFwdParams params{}; params.nhw = d * h * w; params.c = c; params.n = n; @@ -467,30 +436,14 @@ int32_t InstanceNormalizationPlugin::enqueue(nvinfer1::PluginTensorDesc const* i } else { - PLUGIN_ASSERT(false && "Unexpected input format"); + PLUGIN_FAIL("Unexpected input format"); } } return 0; } -size_t InstanceNormalizationPlugin::getSerializationSize() const noexcept -{ - return (serialized_size(mEpsilon) + serialized_size(mNchan) + serialized_size(mHostScale) - + serialized_size(mHostBias) + serialized_size(mRelu) + serialized_size(mAlpha)); -} - -void InstanceNormalizationPlugin::serialize(void* buffer) const noexcept -{ - serialize_value(&buffer, mEpsilon); - serialize_value(&buffer, mNchan); - serialize_value(&buffer, mHostScale); - serialize_value(&buffer, mHostBias); - serialize_value(&buffer, mRelu); - serialize_value(&buffer, mAlpha); -} - -bool InstanceNormalizationPlugin::supportsFormatCombination( - int32_t pos, nvinfer1::PluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept +bool InstanceNormalizationV3Plugin::supportsFormatCombination( + int32_t pos, DynamicPluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept { PLUGIN_ASSERT(inOut && pos < (nbInputs + nbOutputs)); PLUGIN_ASSERT(pos == 0 || pos == 1); @@ -499,70 +452,63 @@ bool InstanceNormalizationPlugin::supportsFormatCombination( // For 5-D tensor (nbSpatialDims == 3), FP32_Linear, FP16_Linear, FP16_DHWC8, and INT8_CDHW32 are supported. // This is because we have special InstanceNorm3D kernels for vectorized formats from MLPerf-Inference. - int32_t const nbDims = inOut[pos].dims.nbDims; + int32_t const nbDims = inOut[pos].desc.dims.nbDims; PLUGIN_ASSERT(nbDims >= 3); PLUGIN_ASSERT(nbDims <= 5); bool const is3DInstanceNorm = (nbDims == 5); bool const isFP32Linear - = (inOut[pos].type == nvinfer1::DataType::kFLOAT && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR - && inOut[pos].type == inOut[0].type && inOut[pos].format == inOut[0].format); + = (inOut[pos].desc.type == nvinfer1::DataType::kFLOAT && inOut[pos].desc.format == nvinfer1::PluginFormat::kLINEAR + && inOut[pos].desc.type == inOut[0].desc.type && inOut[pos].desc.format == inOut[0].desc.format); bool const isFP16Linear - = (inOut[pos].type == nvinfer1::DataType::kHALF && inOut[pos].format == nvinfer1::PluginFormat::kLINEAR - && inOut[pos].type == inOut[0].type && inOut[pos].format == inOut[0].format); + = (inOut[pos].desc.type == nvinfer1::DataType::kHALF && inOut[pos].desc.format == nvinfer1::PluginFormat::kLINEAR + && inOut[pos].desc.type == inOut[0].desc.type && inOut[pos].desc.format == inOut[0].desc.format); bool const isFP16DHWC8 - = (inOut[pos].type == nvinfer1::DataType::kHALF && inOut[pos].format == nvinfer1::PluginFormat::kDHWC8 - && inOut[pos].type == inOut[0].type && inOut[pos].format == inOut[0].format); + = (inOut[pos].desc.type == nvinfer1::DataType::kHALF && inOut[pos].desc.format == nvinfer1::PluginFormat::kDHWC8 + && inOut[pos].desc.type == inOut[0].desc.type && inOut[pos].desc.format == inOut[0].desc.format); bool const isINT8CDHW32 - = (inOut[pos].type == nvinfer1::DataType::kINT8 && inOut[pos].format == nvinfer1::PluginFormat::kCDHW32 - && inOut[pos].type == inOut[0].type && inOut[pos].format == inOut[0].format); + = (inOut[pos].desc.type == nvinfer1::DataType::kINT8 && inOut[pos].desc.format == nvinfer1::PluginFormat::kCDHW32 + && inOut[pos].desc.type == inOut[0].desc.type && inOut[pos].desc.format == inOut[0].desc.format); bool const isFormatOK = isFP32Linear || isFP16Linear || (is3DInstanceNorm && (isFP16DHWC8 || isINT8CDHW32)); // Kernels for vectorized formats only support the case of C % spv == 0. int32_t spv{1}; - switch (inOut[pos].format) + switch (inOut[pos].desc.format) { case nvinfer1::PluginFormat::kDHWC8: spv = 8; break; case nvinfer1::PluginFormat::kCDHW32: spv = 32; break; default: break; } - int32_t const isAlignmentOK = (inOut[pos].dims.d[1] % spv == 0); + int32_t const isAlignmentOK = (inOut[pos].desc.dims.d[1] % spv == 0); return isFormatOK && isAlignmentOK; } -char const* InstanceNormalizationPlugin::getPluginType() const noexcept +char const* InstanceNormalizationV3Plugin::getPluginName() const noexcept { - return INSTANCE_PLUGIN_NAME; + return gInstancePluginName; } -char const* InstanceNormalizationPlugin::getPluginVersion() const noexcept +char const* InstanceNormalizationV3Plugin::getPluginVersion() const noexcept { - return INSTANCE_PLUGIN_VERSION; + return gInstancePluginVersion; } -char const* InstanceNormalizationPluginV2::getPluginVersion() const noexcept +char const* InstanceNormalizationV3Plugin::getPluginNamespace() const noexcept { - return INSTANCE_PLUGIN_VERSION_V2; -} - -void InstanceNormalizationPlugin::destroy() noexcept -{ - delete this; + return mPluginNamespace.c_str(); } -template -IPluginV2DynamicExt* InstanceNormalizationPlugin::cloneBase() const noexcept +InstanceNormalizationV3Plugin* InstanceNormalizationV3Plugin::clone() noexcept { try { - auto* plugin = new PluginType{mEpsilon, mHostScale, mHostBias, mRelu, mAlpha}; + auto* plugin = new InstanceNormalizationV3Plugin{mEpsilon, mHostScale, mHostBias, mRelu, mAlpha}; plugin->setPluginNamespace(mPluginNamespace.c_str()); - plugin->initialize(); return plugin; } catch (std::exception const& e) @@ -572,53 +518,86 @@ IPluginV2DynamicExt* InstanceNormalizationPlugin::cloneBase() const noexcept return nullptr; } -IPluginV2DynamicExt* InstanceNormalizationPlugin::clone() const noexcept +// Set plugin namespace +void InstanceNormalizationV3Plugin::setPluginNamespace(char const* pluginNamespace) noexcept { - return cloneBase(); + try + { + PLUGIN_ASSERT(pluginNamespace != nullptr); + mPluginNamespace = pluginNamespace; + } + catch (std::exception const& e) + { + caughtError(e); + } } -IPluginV2DynamicExt* InstanceNormalizationPluginV2::clone() const noexcept +int32_t InstanceNormalizationV3Plugin::getOutputDataTypes( + DataType* outputTypes, int32_t nbOutputs, DataType const* inputTypes, int32_t nbInputs) const noexcept { - return cloneBase(); + PLUGIN_ASSERT(inputTypes != nullptr); + PLUGIN_ASSERT(nbInputs == 1); + PLUGIN_ASSERT(nbOutputs == 1); + outputTypes[0] = inputTypes[0]; + return 0; } -// Set plugin namespace -void InstanceNormalizationPlugin::setPluginNamespace(char const* pluginNamespace) noexcept +int32_t InstanceNormalizationV3Plugin::getOutputShapes(DimsExprs const* inputs, int32_t nbInputs, DimsExprs const* shapeInputs, + int32_t nbShapeInputs, DimsExprs* outputs, int32_t nbOutputs, IExprBuilder& exprBuilder) noexcept { - mPluginNamespace = pluginNamespace; + PLUGIN_ASSERT(inputs != nullptr); + PLUGIN_ASSERT(nbInputs == 1); + PLUGIN_ASSERT(nbOutputs == 1); + outputs[0] = inputs[0]; + + return 0; } -char const* InstanceNormalizationPlugin::getPluginNamespace() const noexcept +// Attach the plugin object to an execution context and grant the plugin the access to some context resource. +IPluginV3* InstanceNormalizationV3Plugin::attachToContext(IPluginResourceContext* context) noexcept { - return mPluginNamespace.c_str(); + InstanceNormalizationV3Plugin* obj = clone(); + obj->initializeContext(); + return obj; } -nvinfer1::DataType InstanceNormalizationPlugin::getOutputDataType( - int32_t index, nvinfer1::DataType const* inputTypes, int32_t nbInputs) const noexcept +int32_t InstanceNormalizationV3Plugin::configurePlugin(nvinfer1::DynamicPluginTensorDesc const* in, int32_t nbInputs, + nvinfer1::DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept { - PLUGIN_ASSERT(inputTypes && nbInputs > 0 && index == 0); - return inputTypes[0]; + return STATUS_SUCCESS; } -// Attach the plugin object to an execution context and grant the plugin the access to some context resource. -void InstanceNormalizationPlugin::attachToContext( - cudnnContext* cudnnContext, cublasContext* cublasContext, IGpuAllocator* gpuAllocator) noexcept +int32_t InstanceNormalizationV3Plugin::onShapeChange(PluginTensorDesc const* in, int32_t nbInputs, PluginTensorDesc const* out, int32_t nbOutputs) noexcept { + PLUGIN_ASSERT(in != nullptr); + PLUGIN_ASSERT(out != nullptr); + PLUGIN_ASSERT(nbOutputs == 1); + PLUGIN_ASSERT(nbInputs == 1); + // Not support dynamic shape in C dimension + PLUGIN_ASSERT(in[0].dims.d[1] != -1); + return STATUS_SUCCESS; } -// Detach the plugin object from its execution context. -void InstanceNormalizationPlugin::detachFromContext() noexcept {} - -void InstanceNormalizationPlugin::configurePlugin(nvinfer1::DynamicPluginTensorDesc const* in, int32_t nbInputs, - nvinfer1::DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept +PluginFieldCollection const* InstanceNormalizationV3Plugin::getFieldsToSerialize() noexcept { - // Not support dynamic shape in C dimension - PLUGIN_ASSERT(nbInputs == 1 && in[0].desc.dims.d[1] != -1); + mDataToSerialize.clear(); + mDataToSerialize.emplace_back("epsilon", &mEpsilon, PluginFieldType::kFLOAT32, 1); + mDataToSerialize.emplace_back("scales", mHostScale.data(), PluginFieldType::kFLOAT32, mHostScale.size()); + mDataToSerialize.emplace_back("bias", mHostBias.data(), PluginFieldType::kFLOAT32, mHostBias.size()); + mDataToSerialize.emplace_back("relu", &mRelu, PluginFieldType::kINT32, 1); + mDataToSerialize.emplace_back("alpha", &mAlpha, PluginFieldType::kFLOAT32, 1); + + mFCToSerialize.nbFields = mDataToSerialize.size(); + mFCToSerialize.fields = mDataToSerialize.data(); + return &mFCToSerialize; } -// InstanceNormalizationPluginCreator methods -InstanceNormalizationPluginCreator::InstanceNormalizationPluginCreator() + +// InstanceNormalizationV3PluginCreator methods +InstanceNormalizationV3PluginCreator::InstanceNormalizationV3PluginCreator() { + static std::mutex sMutex; + std::lock_guard guard(sMutex); mPluginAttributes.clear(); mPluginAttributes.emplace_back(PluginField("epsilon", nullptr, PluginFieldType::kFLOAT32, 1)); mPluginAttributes.emplace_back(PluginField("scales", nullptr, PluginFieldType::kFLOAT32, 1)); @@ -630,29 +609,23 @@ InstanceNormalizationPluginCreator::InstanceNormalizationPluginCreator() mFC.fields = mPluginAttributes.data(); } -char const* InstanceNormalizationPluginCreator::getPluginName() const noexcept +char const* InstanceNormalizationV3PluginCreator::getPluginName() const noexcept { - return INSTANCE_PLUGIN_NAME; + return gInstancePluginName; } -char const* InstanceNormalizationPluginCreator::getPluginVersion() const noexcept +char const* InstanceNormalizationV3PluginCreator::getPluginVersion() const noexcept { - return INSTANCE_PLUGIN_VERSION; + return gInstancePluginVersion; } -char const* InstanceNormalizationPluginCreatorV2::getPluginVersion() const noexcept -{ - return INSTANCE_PLUGIN_VERSION_V2; -} - -PluginFieldCollection const* InstanceNormalizationPluginCreator::getFieldNames() noexcept +PluginFieldCollection const* InstanceNormalizationV3PluginCreator::getFieldNames() noexcept { return &mFC; } -template -IPluginV2DynamicExt* InstanceNormalizationPluginCreator::createPluginBase( - char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept +IPluginV3* InstanceNormalizationV3PluginCreator::createPlugin( + char const* name, nvinfer1::PluginFieldCollection const* fc, TensorRTPhase phase) noexcept { try { @@ -709,9 +682,8 @@ IPluginV2DynamicExt* InstanceNormalizationPluginCreator::createPluginBase( Weights scaleWeights{DataType::kFLOAT, scaleValues.data(), (int64_t) scaleValues.size()}; Weights biasWeights{DataType::kFLOAT, biasValues.data(), (int64_t) biasValues.size()}; - auto* obj = new PluginType(epsilon, scaleWeights, biasWeights, relu, alpha); + auto* obj = new InstanceNormalizationV3Plugin(epsilon, scaleWeights, biasWeights, relu, alpha); obj->setPluginNamespace(mNamespace.c_str()); - obj->initialize(); return obj; } catch (std::exception const& e) @@ -721,44 +693,21 @@ IPluginV2DynamicExt* InstanceNormalizationPluginCreator::createPluginBase( return nullptr; } -IPluginV2DynamicExt* InstanceNormalizationPluginCreator::createPlugin( - char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept -{ - return createPluginBase(name, fc); -} - -IPluginV2DynamicExt* InstanceNormalizationPluginCreatorV2::createPlugin( - char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept -{ - return createPluginBase(name, fc); -} - -template -IPluginV2DynamicExt* InstanceNormalizationPluginCreator::deserializePluginBase( - char const* name, void const* serialData, size_t serialLength) noexcept +void InstanceNormalizationV3PluginCreator::setPluginNamespace(char const* libNamespace) noexcept { try { - auto* obj = new PluginType{serialData, serialLength}; - obj->setPluginNamespace(mNamespace.c_str()); - obj->initialize(); - return obj; + PLUGIN_VALIDATE(libNamespace != nullptr); + mNamespace = libNamespace; } catch (std::exception const& e) { caughtError(e); } - return nullptr; } -IPluginV2DynamicExt* InstanceNormalizationPluginCreator::deserializePlugin( - char const* name, void const* serialData, size_t serialLength) noexcept +char const* InstanceNormalizationV3PluginCreator::getPluginNamespace() const noexcept { - return deserializePluginBase(name, serialData, serialLength); + return mNamespace.c_str(); } -IPluginV2DynamicExt* InstanceNormalizationPluginCreatorV2::deserializePlugin( - char const* name, void const* serialData, size_t serialLength) noexcept -{ - return deserializePluginBase(name, serialData, serialLength); -} diff --git a/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.h b/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.h index cd494e4cd..71c107a01 100644 --- a/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.h +++ b/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.h @@ -16,6 +16,8 @@ */ #ifndef TRT_INSTANCE_NORMALIZATION_PLUGIN_H #define TRT_INSTANCE_NORMALIZATION_PLUGIN_H +#include "NvInfer.h" +#include "NvInferPlugin.h" #include "common/plugin.h" #include "common/serialize.hpp" #include "instanceNormalizationPlugin/instanceNormFwd.h" @@ -30,80 +32,75 @@ namespace nvinfer1 { namespace plugin { -class InstanceNormalizationPlugin : public nvinfer1::IPluginV2DynamicExt +class InstanceNormalizationV3Plugin : public IPluginV3, + public IPluginV3OneCore, + public IPluginV3OneBuild, + public IPluginV3OneRuntime { public: - InstanceNormalizationPlugin(float epsilon, nvinfer1::Weights const& scale, nvinfer1::Weights const& bias, + InstanceNormalizationV3Plugin(float epsilon, nvinfer1::Weights const& scale, nvinfer1::Weights const& bias, int32_t relu = 0, float alpha = 0.F); - InstanceNormalizationPlugin(float epsilon, std::vector const& scale, std::vector const& bias, + InstanceNormalizationV3Plugin(float epsilon, std::vector const& scale, std::vector const& bias, int32_t relu = 0, float alpha = 0.F); - InstanceNormalizationPlugin(void const* serialData, size_t serialLength); + InstanceNormalizationV3Plugin(void const* serialData, size_t serialLength); - InstanceNormalizationPlugin() = delete; + InstanceNormalizationV3Plugin() = delete; - ~InstanceNormalizationPlugin() override; + InstanceNormalizationV3Plugin(InstanceNormalizationV3Plugin const&) = default; - int32_t getNbOutputs() const noexcept override; + ~InstanceNormalizationV3Plugin() override; - // DynamicExt plugins returns DimsExprs class instead of Dims - using nvinfer1::IPluginV2::getOutputDimensions; - DimsExprs getOutputDimensions(int32_t outputIndex, nvinfer1::DimsExprs const* inputs, int32_t nbInputs, - nvinfer1::IExprBuilder& exprBuilder) noexcept override; + int32_t getNbOutputs() const noexcept override; - int32_t initialize() noexcept override; + IPluginCapability* getCapabilityInterface(PluginCapabilityType type) noexcept override; - void terminate() noexcept override; + InstanceNormalizationV3Plugin* clone() noexcept override; - using nvinfer1::IPluginV2::getWorkspaceSize; - size_t getWorkspaceSize(nvinfer1::PluginTensorDesc const* inputs, int32_t nbInputs, - nvinfer1::PluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override; + char const* getPluginName() const noexcept override; - using nvinfer1::IPluginV2::enqueue; - int32_t enqueue(nvinfer1::PluginTensorDesc const* inputDesc, nvinfer1::PluginTensorDesc const* outputDesc, - void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept override; + char const* getPluginNamespace() const noexcept override; - size_t getSerializationSize() const noexcept override; + size_t getWorkspaceSize(DynamicPluginTensorDesc const* inputs, int32_t nbInputs, + DynamicPluginTensorDesc const* outputs, int32_t nbOutputs) const noexcept override; - void serialize(void* buffer) const noexcept override; + int32_t enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc, void const* const* inputs, + void* const* outputs, void* workspace, cudaStream_t stream) noexcept override; // DynamicExt plugin supportsFormat update. bool supportsFormatCombination( - int32_t pos, nvinfer1::PluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept override; - - char const* getPluginType() const noexcept override; + int32_t pos, DynamicPluginTensorDesc const* inOut, int32_t nbInputs, int32_t nbOutputs) noexcept override; char const* getPluginVersion() const noexcept override; - void destroy() noexcept override; + void setPluginNamespace(char const* pluginNamespace) noexcept; - nvinfer1::IPluginV2DynamicExt* clone() const noexcept override; + int32_t getOutputDataTypes( + DataType* outputTypes, int32_t nbOutputs, DataType const* inputTypes, int32_t nbInputs) const noexcept override; - void setPluginNamespace(char const* pluginNamespace) noexcept override; + int32_t getOutputShapes(DimsExprs const* inputs, int32_t nbInputs, DimsExprs const* shapeInputs, + int32_t nbShapeInputs, DimsExprs* outputs, int32_t nbOutputs, IExprBuilder& exprBuilder) noexcept override; - char const* getPluginNamespace() const noexcept override; + IPluginV3* attachToContext(IPluginResourceContext* context) noexcept override; - DataType getOutputDataType( - int32_t index, nvinfer1::DataType const* inputTypes, int32_t nbInputs) const noexcept override; + int32_t configurePlugin(DynamicPluginTensorDesc const* in, int32_t nbInputs, DynamicPluginTensorDesc const* out, + int32_t nbOutputs) noexcept override; - void attachToContext( - cudnnContext* cudnn, cublasContext* cublas, nvinfer1::IGpuAllocator* allocator) noexcept override; + int32_t onShapeChange( + PluginTensorDesc const* in, int32_t nbInputs, PluginTensorDesc const* out, int32_t nbOutputs) noexcept override; - void detachFromContext() noexcept override; + PluginFieldCollection const* getFieldsToSerialize() noexcept override; - using nvinfer1::IPluginV2Ext::configurePlugin; - void configurePlugin(nvinfer1::DynamicPluginTensorDesc const* in, int32_t nbInputs, - nvinfer1::DynamicPluginTensorDesc const* out, int32_t nbOutputs) noexcept override; + int32_t initializeContext(); protected: - template - nvinfer1::IPluginV2DynamicExt* cloneBase() const noexcept; + void exitContext(); private: - float mEpsilon; - float mAlpha; - int32_t mRelu; - int32_t mNchan; + float mEpsilon{}; + float mAlpha{}; + int32_t mRelu{}; + int32_t mNchan{}; std::vector mHostScale; std::vector mHostBias; float* mDeviceScale{nullptr}; @@ -115,20 +112,21 @@ class InstanceNormalizationPlugin : public nvinfer1::IPluginV2DynamicExt nvinfer1::pluginInternal::cudnnTensorDescriptor_t mYDescriptor{nullptr}; nvinfer1::pluginInternal::cudnnTensorDescriptor_t mBDescriptor{nullptr}; std::string mPluginNamespace; - std::string mNamespace; bool mInitialized{false}; int32_t mCudaDriverVersion{-1}; + std::vector mDataToSerialize; + nvinfer1::PluginFieldCollection mFCToSerialize; // NDHWC implementation instance_norm_impl::InstanceNormFwdContext mContext; }; -class InstanceNormalizationPluginCreator : public nvinfer1::pluginInternal::BaseCreator +class InstanceNormalizationV3PluginCreator : public nvinfer1::IPluginCreatorV3One { public: - InstanceNormalizationPluginCreator(); + InstanceNormalizationV3PluginCreator(); - ~InstanceNormalizationPluginCreator() override = default; + ~InstanceNormalizationV3PluginCreator() override = default; char const* getPluginName() const noexcept override; @@ -136,56 +134,16 @@ class InstanceNormalizationPluginCreator : public nvinfer1::pluginInternal::Base PluginFieldCollection const* getFieldNames() noexcept override; - IPluginV2DynamicExt* createPlugin(char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept override; - - IPluginV2DynamicExt* deserializePlugin( - char const* name, void const* serialData, size_t serialLength) noexcept override; + IPluginV3* createPlugin(char const* name, PluginFieldCollection const* fc, TensorRTPhase phase) noexcept override; -protected: - template - IPluginV2DynamicExt* createPluginBase(char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept; + void setPluginNamespace(char const* libNamespace) noexcept; - template - IPluginV2DynamicExt* deserializePluginBase(char const* name, void const* serialData, size_t serialLength) noexcept; + char const* getPluginNamespace() const noexcept override; private: static PluginFieldCollection mFC; static std::vector mPluginAttributes; -}; - -// For backward compatibility, create version "2" of the identical plugin. -// Background: in TRT 8.0, we added 3D InstanceNorm plugin as the version 2 of the "InstanceNormalization_TRT" plugin. -// However, in TRT 8.2, we have fused it into version 1, so a separate version 2 is no longer needed, but is only kept -// for backward compatibility. -class InstanceNormalizationPluginV2 final : public InstanceNormalizationPlugin -{ -public: - InstanceNormalizationPluginV2(float epsilon, nvinfer1::Weights const& scale, nvinfer1::Weights const& bias, - int32_t relu = 0, float alpha = 0.F) - : InstanceNormalizationPlugin(epsilon, scale, bias, relu, alpha) - { - } - InstanceNormalizationPluginV2(float epsilon, std::vector const& scale, std::vector const& bias, - int32_t relu = 0, float alpha = 0.F) - : InstanceNormalizationPlugin(epsilon, scale, bias, relu, alpha) - { - } - InstanceNormalizationPluginV2(void const* serialData, size_t serialLength) - : InstanceNormalizationPlugin(serialData, serialLength) - { - } - InstanceNormalizationPluginV2() = delete; - char const* getPluginVersion() const noexcept override; - nvinfer1::IPluginV2DynamicExt* clone() const noexcept override; -}; - -class InstanceNormalizationPluginCreatorV2 final : public InstanceNormalizationPluginCreator -{ -public: - char const* getPluginVersion() const noexcept override; - IPluginV2DynamicExt* createPlugin(char const* name, nvinfer1::PluginFieldCollection const* fc) noexcept override; - IPluginV2DynamicExt* deserializePlugin( - char const* name, void const* serialData, size_t serialLength) noexcept override; + std::string mNamespace; }; } // namespace plugin diff --git a/plugin/leakyReluPlugin/LReLU_PluginConfig.yaml b/plugin/leakyReluPlugin/LReLU_PluginConfig.yaml index 36267fd9e..7376a419d 100644 --- a/plugin/leakyReluPlugin/LReLU_PluginConfig.yaml +++ b/plugin/leakyReluPlugin/LReLU_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: LReLU_TRT interface: "IPluginV2" diff --git a/plugin/modulatedDeformConvPlugin/CustomModulatedDeformConv2d_PluginConfig.yaml b/plugin/modulatedDeformConvPlugin/CustomModulatedDeformConv2d_PluginConfig.yaml index a0d0745ba..ef4b867f6 100644 --- a/plugin/modulatedDeformConvPlugin/CustomModulatedDeformConv2d_PluginConfig.yaml +++ b/plugin/modulatedDeformConvPlugin/CustomModulatedDeformConv2d_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2023-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: ModulatedDeformConv2d interface: "IPluginV2DynamicExt" diff --git a/plugin/multilevelCropAndResizePlugin/MultilevelCropAndResize_PluginConfig.yaml b/plugin/multilevelCropAndResizePlugin/MultilevelCropAndResize_PluginConfig.yaml index e6713c05a..2fbe216a5 100644 --- a/plugin/multilevelCropAndResizePlugin/MultilevelCropAndResize_PluginConfig.yaml +++ b/plugin/multilevelCropAndResizePlugin/MultilevelCropAndResize_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: MultilevelCropAndResize_TRT interface: "IPluginV2Ext" diff --git a/plugin/multilevelProposeROI/MultilevelProposeROI_PluginConfig.yaml b/plugin/multilevelProposeROI/MultilevelProposeROI_PluginConfig.yaml index 69d015e42..c6c7e2a06 100644 --- a/plugin/multilevelProposeROI/MultilevelProposeROI_PluginConfig.yaml +++ b/plugin/multilevelProposeROI/MultilevelProposeROI_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: MultilevelProposeROI_TRT interface: "IPluginV2Ext" diff --git a/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableAttn.h b/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableAttn.h index 503363896..e7115d6e0 100644 --- a/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableAttn.h +++ b/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableAttn.h @@ -18,12 +18,12 @@ /* ************************************************************************** * Modified from Deformable DETR - * Copyright (c) 2020 SenseTime. All Rights Reserved. + * Copyright (c) 2020-2023 SenseTime. All Rights Reserved. * Licensed under the Apache License, Version 2.0 [see LICENSE for details] * https://github.com/fundamentalvision/Deformable-DETR/blob/main/LICENSE ************************************************************************** * Modified from DCN (https://github.com/msracver/Deformable-ConvNets) - * Copyright (c) 2018 Microsoft + * Copyright (c) 2018-2023 Microsoft ************************************************************************** * Modified from https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0 ************************************************************************************************** diff --git a/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableIm2ColCuda.cuh b/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableIm2ColCuda.cuh index 454b9f03a..ff469c226 100644 --- a/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableIm2ColCuda.cuh +++ b/plugin/multiscaleDeformableAttnPlugin/multiscaleDeformableIm2ColCuda.cuh @@ -16,12 +16,12 @@ * ************************************************************************** * Modified from Deformable DETR - * Copyright (c) 2020 SenseTime. All Rights Reserved. + * Copyright (c) 2020-2023 SenseTime. All Rights Reserved. * Licensed under the Apache License, Version 2.0 [see LICENSE for details] * https://github.com/fundamentalvision/Deformable-DETR/blob/main/LICENSE ************************************************************************** * Modified from DCN (https://github.com/msracver/Deformable-ConvNets) - * Copyright (c) 2018 Microsoft + * Copyright (c) 2018-2023 Microsoft ************************************************************************** */ diff --git a/plugin/pillarScatterPlugin/PillarScatterPlugin_PluginConfig.yaml b/plugin/pillarScatterPlugin/PillarScatterPlugin_PluginConfig.yaml index 8bd1f394f..53bd80933 100644 --- a/plugin/pillarScatterPlugin/PillarScatterPlugin_PluginConfig.yaml +++ b/plugin/pillarScatterPlugin/PillarScatterPlugin_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: PillarScatterPlugin interface: "IPluginV2DynamicExt" diff --git a/plugin/proposalLayerPlugin/ProposalLayer_PluginConfig.yaml b/plugin/proposalLayerPlugin/ProposalLayer_PluginConfig.yaml index dd61778d3..0b92709d6 100644 --- a/plugin/proposalLayerPlugin/ProposalLayer_PluginConfig.yaml +++ b/plugin/proposalLayerPlugin/ProposalLayer_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: ProposalLayer_TRT interface: "IPluginV2Ext" diff --git a/plugin/reorgPlugin/Reorg_PluginConfig.yaml b/plugin/reorgPlugin/Reorg_PluginConfig.yaml index 9d79d9231..9cfc1a2f9 100644 --- a/plugin/reorgPlugin/Reorg_PluginConfig.yaml +++ b/plugin/reorgPlugin/Reorg_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: Reorg_TRT interface: "IPluginV2DynamicExt" diff --git a/plugin/resizeNearestPlugin/ResizeNearest_PluginConfig.yaml b/plugin/resizeNearestPlugin/ResizeNearest_PluginConfig.yaml index 8388a8443..2914e2738 100644 --- a/plugin/resizeNearestPlugin/ResizeNearest_PluginConfig.yaml +++ b/plugin/resizeNearestPlugin/ResizeNearest_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: ResizeNearest_TRT interface: "IPluginV2Ext" diff --git a/plugin/roiAlignPlugin/ROIAlign_PluginConfig.yaml b/plugin/roiAlignPlugin/ROIAlign_PluginConfig.yaml index a39daae00..3bed92e49 100644 --- a/plugin/roiAlignPlugin/ROIAlign_PluginConfig.yaml +++ b/plugin/roiAlignPlugin/ROIAlign_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: ROIAlign_TRT interface: "IPluginV3" diff --git a/plugin/scatterElementsPlugin/ScatterElementsPlugin_PluginConfig.yaml b/plugin/scatterElementsPlugin/ScatterElementsPlugin_PluginConfig.yaml index e282fd696..7ad6e8dc4 100644 --- a/plugin/scatterElementsPlugin/ScatterElementsPlugin_PluginConfig.yaml +++ b/plugin/scatterElementsPlugin/ScatterElementsPlugin_PluginConfig.yaml @@ -1,3 +1,20 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + --- name: ScatterElements interface: "IPluginV2DynamicExt" diff --git a/plugin/skipLayerNormPlugin/CustomSkipLayerNormPluginDynamic_PluginConfig.yaml b/plugin/skipLayerNormPlugin/CustomSkipLayerNormPluginDynamic_PluginConfig.yaml index a2f309c3a..a39fd9bc1 100644 --- a/plugin/skipLayerNormPlugin/CustomSkipLayerNormPluginDynamic_PluginConfig.yaml +++ b/plugin/skipLayerNormPlugin/CustomSkipLayerNormPluginDynamic_PluginConfig.yaml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# --- name: CustomSkipLayerNormPluginDynamic interface: "IPluginV2DynamicExt" diff --git a/python/docstrings/infer/pyCoreDoc.h b/python/docstrings/infer/pyCoreDoc.h index eb604771d..b59803a92 100644 --- a/python/docstrings/infer/pyCoreDoc.h +++ b/python/docstrings/infer/pyCoreDoc.h @@ -1006,6 +1006,8 @@ constexpr char const* WEIGHTLESS constexpr char const* STRIP_PLAN = R"trtdoc(Strip the refittable weights from the engine plan file.)trtdoc"; constexpr char const* REFIT_IDENTICAL = R"trtdoc(Create a refittable engine using identical weights. Different weights during refits yield unpredictable behavior.)trtdoc"; +constexpr char const* REFIT_INDIVIDUAL + = R"trtdoc(Create a refittable engine and allows the users to specify which weights are refittable and which are not.)trtdoc"; constexpr char const* WEIGHT_STREAMING = R"trtdoc(Enable building with the ability to stream varying amounts of weights during Runtime. This decreases GPU memory of TRT at the expense of performance.)trtdoc"; constexpr char const* INT4 = R"trtdoc(Enable plugins with INT4 input/output)trtdoc"; @@ -1092,6 +1094,26 @@ constexpr char const* AMPERE_PLUS = R"trtdoc( )trtdoc"; } // namespace HardwareCompatibilityLevelDoc +namespace RuntimePlatformDoc +{ +constexpr char const* descr = R"trtdoc( + Describes the intended runtime platform for the execution of the TensorRT engine. + TensorRT provides support for cross-platform engine compatibility when the target runtime platform is different from the build platform. + + **NOTE:** The cross-platform engine will not be able to run on the host platform it was built on. + + **NOTE:** When building a cross-platform engine that also requires version forward compatibility, EXCLUDE_LEAN_RUNTIME must be set to exclude the target platform lean runtime. + + **NOTE:** The cross-platform engine might have performance differences compared to the natively built engine on the target platform. +)trtdoc"; +constexpr char const* SAME_AS_BUILD = R"trtdoc( + No requirement for cross-platform compatibility. The engine constructed by TensorRT can only run on the identical platform it was built on. +)trtdoc"; +constexpr char const* WINDOWS_AMD64 = R"trtdoc( + Designates the target platform for engine execution as Windows AMD64 system. Currently this flag can only be enabled when building engines on Linux AMD64 platforms. +)trtdoc"; +} // namespace RuntimePlatformDoc + namespace NetworkDefinitionCreationFlagDoc { constexpr char const* descr diff --git a/python/docstrings/infer/pyGraphDoc.h b/python/docstrings/infer/pyGraphDoc.h index 19a0df9f8..09ef6cb18 100644 --- a/python/docstrings/infer/pyGraphDoc.h +++ b/python/docstrings/infer/pyGraphDoc.h @@ -1839,6 +1839,18 @@ constexpr const char* mark_output = R"trtdoc( :arg tensor: The tensor to mark. )trtdoc"; +constexpr const char* mark_weights_refittable = R"trtdoc( + Mark a weight as refittable. + + :arg name: The weight to mark. +)trtdoc"; + +constexpr const char* are_weights_marked_refittable = R"trtdoc( + Whether the weight has been marked as refittable. + + :arg name: The name of the weights to check. +)trtdoc"; + constexpr const char* mark_debug = R"trtdoc( Mark a tensor as a debug tensor in the network. @@ -1847,6 +1859,12 @@ constexpr const char* mark_debug = R"trtdoc( :returns: True on success, False otherwise. )trtdoc"; +constexpr const char* unmark_weights_refittable = R"trtdoc( + Unmark a weight as refittable. + + :arg name: The weight to unmark. +)trtdoc"; + constexpr const char* unmark_debug = R"trtdoc( Unmark a tensor as a debug tensor in the network. diff --git a/python/packaging/bindings_wheel/setup.cfg b/python/packaging/bindings_wheel/setup.cfg index ef9e2d189..9e20a94e5 100644 --- a/python/packaging/bindings_wheel/setup.cfg +++ b/python/packaging/bindings_wheel/setup.cfg @@ -1,2 +1,12 @@ +# SPDX-FileCopyrightText: Copyright (c) 2019-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. + [metadata] license_files = LICENSE.txt diff --git a/python/packaging/bindings_wheel/tensorrt/__init__.py b/python/packaging/bindings_wheel/tensorrt/__init__.py index efcbcf645..22022e9ea 100644 --- a/python/packaging/bindings_wheel/tensorrt/__init__.py +++ b/python/packaging/bindings_wheel/tensorrt/__init__.py @@ -20,8 +20,6 @@ import sys import warnings -ENABLE_LONG_TERM = bool(int(os.environ.get("NV_INTERNAL_ENABLE_LONG_TERM", "0"))) - # For standalone wheels, attempt to import the wheel containing the libraries. _libs_wheel_imported = False try: @@ -197,10 +195,7 @@ def _itemsize(trt_type): fp8: 1, int4: 0.5, } - # $nv-internal-release begin - if ENABLE_LONG_TERM: - mapping[fp4] = 0.5 - # $nv-internal-release end + if trt_type in mapping: return mapping[trt_type] diff --git a/python/packaging/frontend_sdist/setup.cfg b/python/packaging/frontend_sdist/setup.cfg index 32a8c1c0e..dea8290ce 100644 --- a/python/packaging/frontend_sdist/setup.cfg +++ b/python/packaging/frontend_sdist/setup.cfg @@ -1,3 +1,13 @@ +# SPDX-FileCopyrightText: Copyright (c) 2019-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. + [metadata] license_files = LICENSE.txt diff --git a/python/packaging/libs_wheel/setup.cfg b/python/packaging/libs_wheel/setup.cfg index 32a8c1c0e..dea8290ce 100644 --- a/python/packaging/libs_wheel/setup.cfg +++ b/python/packaging/libs_wheel/setup.cfg @@ -1,3 +1,13 @@ +# SPDX-FileCopyrightText: Copyright (c) 2019-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: LicenseRef-NvidiaProprietary +# +# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual +# property and proprietary rights in and to this material, related +# documentation and any modifications thereto. Any use, reproduction, +# disclosure or distribution of this material and related documentation +# without an express license agreement from NVIDIA CORPORATION or +# its affiliates is strictly prohibited. + [metadata] license_files = LICENSE.txt diff --git a/python/packaging/metapackage/setup.cfg b/python/packaging/metapackage/setup.cfg index 32a8c1c0e..8071afae3 100644 --- a/python/packaging/metapackage/setup.cfg +++ b/python/packaging/metapackage/setup.cfg @@ -1,3 +1,20 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + [metadata] license_files = LICENSE.txt diff --git a/python/packaging/metapackage/setup.py b/python/packaging/metapackage/setup.py index bd673247e..43e563e6d 100644 --- a/python/packaging/metapackage/setup.py +++ b/python/packaging/metapackage/setup.py @@ -34,7 +34,9 @@ "Programming Language :: Python :: 3", ], packages=[], - install_requires=["##TENSORRT_MODULE##-cu##CUDA_MAJOR##"], + install_requires=[ + "##TENSORRT_MODULE##-cu##CUDA_MAJOR##==##TENSORRT_PYTHON_VERSION##" + ], include_package_data=True, zip_safe=True, keywords="nvidia tensorrt deeplearning inference", diff --git a/python/src/infer/pyCore.cpp b/python/src/infer/pyCore.cpp index 52d9cb75a..e96597001 100644 --- a/python/src/infer/pyCore.cpp +++ b/python/src/infer/pyCore.cpp @@ -1364,6 +1364,7 @@ void bindCore(py::module& m) .value("REFIT_IDENTICAL", BuilderFlag::kREFIT_IDENTICAL, BuilderFlagDoc::REFIT_IDENTICAL) .value("WEIGHT_STREAMING", BuilderFlag::kWEIGHT_STREAMING, BuilderFlagDoc::WEIGHT_STREAMING) .value("INT4", BuilderFlag::kINT4, BuilderFlagDoc::INT4) + .value("REFIT_INDIVIDUAL", BuilderFlag::kREFIT_INDIVIDUAL, BuilderFlagDoc::REFIT_INDIVIDUAL) ; py::enum_(m, "MemoryPoolType", MemoryPoolTypeDoc::descr, py::module_local()) @@ -1386,6 +1387,10 @@ void bindCore(py::module& m) .value("NONE", HardwareCompatibilityLevel::kNONE, HardwareCompatibilityLevelDoc::NONE) .value("AMPERE_PLUS", HardwareCompatibilityLevel::kAMPERE_PLUS, HardwareCompatibilityLevelDoc::AMPERE_PLUS); + py::enum_(m, "RuntimePlatform", RuntimePlatformDoc::descr, py::module_local()) + .value("SAME_AS_BUILD", RuntimePlatform::kSAME_AS_BUILD, RuntimePlatformDoc::SAME_AS_BUILD) + .value("WINDOWS_AMD64", RuntimePlatform::kWINDOWS_AMD64, RuntimePlatformDoc::WINDOWS_AMD64); + py::enum_(m, "DeviceType", DeviceTypeDoc::descr, py::module_local()) .value("GPU", DeviceType::kGPU, DeviceTypeDoc::GPU) .value("DLA", DeviceType::kDLA, DeviceTypeDoc::DLA); @@ -1483,6 +1488,7 @@ void bindCore(py::module& m) &IBuilderConfig::setBuilderOptimizationLevel) .def_property("hardware_compatibility_level", &IBuilderConfig::getHardwareCompatibilityLevel, &IBuilderConfig::setHardwareCompatibilityLevel) + .def_property("runtime_platform", &IBuilderConfig::getRuntimePlatform, &IBuilderConfig::setRuntimePlatform) .def_property("plugins_to_serialize", lambdas::get_plugins_to_serialize, lambdas::set_plugins_to_serialize) .def_property("max_aux_streams", &IBuilderConfig::getMaxAuxStreams, &IBuilderConfig::setMaxAuxStreams) .def_property("progress_monitor", &IBuilderConfig::getProgressMonitor, diff --git a/python/src/infer/pyGraph.cpp b/python/src/infer/pyGraph.cpp index 57bfc2647..f6ebbfe0e 100644 --- a/python/src/infer/pyGraph.cpp +++ b/python/src/infer/pyGraph.cpp @@ -828,6 +828,8 @@ namespace tensorrt .def_property("error_recorder", &INetworkDefinition::getErrorRecorder, py::cpp_function(&INetworkDefinition::setErrorRecorder, py::keep_alive<1, 2>{})) .def("mark_output", &INetworkDefinition::markOutput, "tensor"_a, INetworkDefinitionDoc::mark_output) + .def("mark_weights_refittable", &INetworkDefinition::markWeightsRefittable, "name"_a, INetworkDefinitionDoc::mark_weights_refittable) + .def("are_weights_marked_refittable", &INetworkDefinition::areWeightsMarkedRefittable, "name"_a, INetworkDefinitionDoc::are_weights_marked_refittable) // Layers .def("add_input", &INetworkDefinition::addInput, "name"_a, "dtype"_a, "shape"_a, INetworkDefinitionDoc::add_input, py::return_value_policy::reference_internal) @@ -933,6 +935,7 @@ namespace tensorrt .def("unmark_output", &INetworkDefinition::unmarkOutput, "tensor"_a, INetworkDefinitionDoc::unmark_output) .def("mark_output_for_shapes", &INetworkDefinition::markOutputForShapes, "tensor"_a, INetworkDefinitionDoc::mark_output_for_shapes) .def("unmark_output_for_shapes", &INetworkDefinition::unmarkOutputForShapes, "tensor"_a, INetworkDefinitionDoc::unmark_output_for_shapes) + .def("unmark_weights_refittable", &INetworkDefinition::unmarkWeightsRefittable, "name"_a, INetworkDefinitionDoc::unmark_weights_refittable) .def("set_weights_name", &INetworkDefinition::setWeightsName, "weights"_a, "name"_a, INetworkDefinitionDoc::set_weights_name) // Getters .def("get_layer", &INetworkDefinition::getLayer, "index"_a, INetworkDefinitionDoc::get_layer, diff --git a/python/src/parsers/pyOnnx.cpp b/python/src/parsers/pyOnnx.cpp index 3cb454193..e9dc446f9 100644 --- a/python/src/parsers/pyOnnx.cpp +++ b/python/src/parsers/pyOnnx.cpp @@ -18,6 +18,7 @@ // Implementation of PyBind11 Binding Code for OnnxParser #include "ForwardDeclarations.h" #include "onnx/NvOnnxParser.h" +#include "onnx/errorHelpers.hpp" #include "parsers/pyOnnxDoc.h" #include "utils.h" #include @@ -32,40 +33,6 @@ namespace tensorrt // Long lambda functions should go here rather than being inlined into the bindings (1 liners are OK). namespace lambdas { -static const auto error_code_str = [](ErrorCode self) { - switch (self) - { - case ErrorCode::kSUCCESS: return "SUCCESS"; - case ErrorCode::kINTERNAL_ERROR: return "INTERNAL_ERROR"; - case ErrorCode::kMEM_ALLOC_FAILED: return "MEM_ALLOC_FAILED"; - case ErrorCode::kMODEL_DESERIALIZE_FAILED: return "MODEL_DESERIALIZE_FAILED"; - case ErrorCode::kINVALID_VALUE: return "INVALID_VALUE"; - case ErrorCode::kINVALID_GRAPH: return "INVALID_GRAPH"; - case ErrorCode::kINVALID_NODE: return "INVALID_NODE"; - case ErrorCode::kUNSUPPORTED_GRAPH: return "UNSUPPORTED_GRAPH"; - case ErrorCode::kUNSUPPORTED_NODE: return "UNSUPPORTED_NODE"; - case ErrorCode::kUNSUPPORTED_NODE_ATTR: return "UNSUPPORTED_NODE_ATTR"; - case ErrorCode::kUNSUPPORTED_NODE_INPUT: return "UNSUPPORTED_NODE_INPUT"; - case ErrorCode::kUNSUPPORTED_NODE_DATATYPE: return "UNSUPPORTED_NODE_DATATYPE"; - case ErrorCode::kUNSUPPORTED_NODE_DYNAMIC: return "UNSUPPORTED_NODE_DYNAMIC"; - case ErrorCode::kUNSUPPORTED_NODE_SHAPE: return "UNSUPPORTED_NODE_SHAPE"; - case ErrorCode::kREFIT_FAILED: return "REFIT_FAILED"; - } - return "UNKNOWN"; -}; - -static const auto parser_error_str = [](IParserError& self) { - std::string const node_info = "In node " + std::to_string(self.node()) + " with name: " + self.nodeName() - + " and operator: " + self.nodeOperator() + " "; - std::string const error_info - = std::string("(") + self.func() + "): " + error_code_str(self.code()) + ": " + self.desc(); - if (self.code() == ErrorCode::kMODEL_DESERIALIZE_FAILED || self.code() == ErrorCode::kREFIT_FAILED) - { - return error_info; - } - return node_info + error_info; -}; - static const auto parse = [](IParser& self, py::buffer const& model, char const* path = nullptr) { py::buffer_info info = model.request(); return self.parse(info.ptr, info.size * info.itemsize, path); @@ -199,8 +166,8 @@ void bindOnnx(py::module& m) .value("UNSUPPORTED_NODE_DYNAMIC", ErrorCode::kUNSUPPORTED_NODE_DYNAMIC) .value("UNSUPPORTED_NODE_SHAPE", ErrorCode::kUNSUPPORTED_NODE_SHAPE) .value("REFIT_FAILED", ErrorCode::kREFIT_FAILED) - .def("__str__", lambdas::error_code_str) - .def("__repr__", lambdas::error_code_str); + .def("__str__", &onnx2trt::errorCodeStr) + .def("__repr__", &onnx2trt::errorCodeStr); py::class_>(m, "ParserError", py::module_local()) .def("code", &IParserError::code, ParserErrorDoc::code) @@ -214,8 +181,8 @@ void bindOnnx(py::module& m) .def("local_function_stack", lambdas::get_local_function_stack, ParserErrorDoc::local_function_stack) .def("local_function_stack_size", &IParserError::localFunctionStackSize, ParserErrorDoc::local_function_stack_size) - .def("__str__", lambdas::parser_error_str) - .def("__repr__", lambdas::parser_error_str); + .def("__str__", &onnx2trt::parserErrorStr) + .def("__repr__", &onnx2trt::parserErrorStr); py::class_(m, "OnnxParserRefitter", OnnxParserRefitterDoc::descr, py::module_local()) .def(py::init(&nvonnxparser::createParserRefitter), "refitter"_a, "logger"_a, OnnxParserRefitterDoc::init, diff --git a/samples/common/sampleEngines.cpp b/samples/common/sampleEngines.cpp index cb7521a8a..2fc29eaaa 100644 --- a/samples/common/sampleEngines.cpp +++ b/samples/common/sampleEngines.cpp @@ -1162,6 +1162,7 @@ bool setupNetworkAndConfig(BuildOptions const& build, SystemOptions const& sys, } config.setHardwareCompatibilityLevel(build.hardwareCompatibilityLevel); + config.setRuntimePlatform(build.runtimePlatform); if (build.maxAuxStreams != defaultMaxAuxStreams) { diff --git a/samples/common/sampleOptions.cpp b/samples/common/sampleOptions.cpp index 1532b66ea..802c02ad4 100644 --- a/samples/common/sampleOptions.cpp +++ b/samples/common/sampleOptions.cpp @@ -792,6 +792,11 @@ std::ostream& printTacticSources( std::ostream& printPrecision(std::ostream& os, BuildOptions const& options) { + if (options.stronglyTyped) + { + os << "Strongly Typed"; + return os; + } os << "FP32"; if (options.fp16) { @@ -813,10 +818,6 @@ std::ostream& printPrecision(std::ostream& os, BuildOptions const& options) { os << "+INT4"; } - if (options.stronglyTyped) - { - os << " (Strongly Typed)"; - } if (options.precisionConstraints == PrecisionConstraints::kOBEY) { os << " (obey precision constraints)"; @@ -1445,6 +1446,22 @@ void BuildOptions::parse(Arguments& arguments) getAndDelOption(arguments, "--errorOnTimingCacheMiss", errorOnTimingCacheMiss); getAndDelOption(arguments, "--builderOptimizationLevel", builderOptimizationLevel); + std::string runtimePlatformArgs; + getAndDelOption(arguments, "--runtimePlatform", runtimePlatformArgs); + if (runtimePlatformArgs == "SameAsBuild" || runtimePlatformArgs.empty()) + { + runtimePlatform = RuntimePlatform::kSAME_AS_BUILD; + } + else if (runtimePlatformArgs == "WindowsAMD64") + { + runtimePlatform = RuntimePlatform::kWINDOWS_AMD64; + } + else + { + throw std::invalid_argument(std::string("Unknown runtime platform: ") + runtimePlatformArgs + + ". Valid options: SameAsBuild, WindowsAMD64."); + } + std::string hardwareCompatibleArgs; getAndDelOption(arguments, "--hardwareCompatibilityLevel", hardwareCompatibleArgs); if (hardwareCompatibleArgs == "none" || hardwareCompatibleArgs.empty()) @@ -1747,6 +1764,10 @@ void AllOptions::parse(Arguments& arguments) { build.buildDLAStandalone = true; } + if (build.runtimePlatform != nvinfer1::RuntimePlatform::kSAME_AS_BUILD) + { + build.skipInference = true; + } if (build.buildDLAStandalone) { build.skipInference = true; @@ -2045,6 +2066,24 @@ std::ostream& operator<<(std::ostream& os, nvinfer1::DeviceType devType) return os; } +std::ostream& operator<<(std::ostream& os, nvinfer1::RuntimePlatform platform) +{ + switch (platform) + { + case nvinfer1::RuntimePlatform::kSAME_AS_BUILD: + { + os << "Same As Build"; + break; + } + case nvinfer1::RuntimePlatform::kWINDOWS_AMD64: + { + os << "Windows AMD64"; + break; + } + } + return os; +} + std::ostream& operator<<(std::ostream& os, const ShapeRange& dims) { int32_t i = 0; @@ -2138,6 +2177,7 @@ std::ostream& operator<<(std::ostream& os, const BuildOptions& options) "BuilderOptimizationLevel: " << options.builderOptimizationLevel << std::endl << "Calibration Profile Index: " << options.calibProfile << std::endl << "Weight Streaming: " << boolToEnabled(options.allowWeightStreaming) << std::endl << + "Runtime Platform: " << options.runtimePlatform << std::endl << "Debug Tensors: " << options.debugTensors << std::endl; // clang-format on @@ -2514,6 +2554,11 @@ void BuildOptions::help(std::ostream& os) R"( Hardware Compatibility Level: mode ::= "none" | "ampere+")" "\n" " none = no compatibility" "\n" " ampere+ = compatible with Ampere and newer GPUs" "\n" + " --runtimePlatform=platform Set the target platform for runtime execution. (default = SameAsBuild)" "\n" + " When this option is enabled, --skipInference is enabled by default." "\n" + R"( RuntimePlatfrom: platform ::= "SameAsBuild" | "WindowsAMD64")" "\n" + " SameAsBuild = no requirement for cross-platform compatibility." "\n" + " WindowsAMD64 = set the target platform for engine execution as Windows AMD64 system" "\n" " --tempdir= Overrides the default temporary directory TensorRT will use when creating temporary files." "\n" " See IRuntime::setTemporaryDirectory API documentation for more information." "\n" " --tempfileControls=controls Controls what TensorRT is allowed to use when creating temporary executable files." "\n" diff --git a/samples/common/sampleOptions.h b/samples/common/sampleOptions.h index 3c00ffa3a..4e59da114 100644 --- a/samples/common/sampleOptions.h +++ b/samples/common/sampleOptions.h @@ -257,6 +257,7 @@ class BuildOptions : public Options // Use int32_t to support C++11 compilers. std::unordered_map previewFeatures; nvinfer1::HardwareCompatibilityLevel hardwareCompatibilityLevel{nvinfer1::HardwareCompatibilityLevel::kNONE}; + nvinfer1::RuntimePlatform runtimePlatform{nvinfer1::RuntimePlatform::kSAME_AS_BUILD}; std::string tempdir{}; nvinfer1::TempfileControlFlags tempfileControls{getTempfileControlDefaults()}; RuntimeMode useRuntime{RuntimeMode::kFULL}; diff --git a/samples/python/detectron2/requirements.txt b/samples/python/detectron2/requirements.txt index 0cf8bfd49..0dd8a25b6 100644 --- a/samples/python/detectron2/requirements.txt +++ b/samples/python/detectron2/requirements.txt @@ -3,9 +3,11 @@ onnxruntime==1.15.1 Pillow>=10.0.0 git+https://github.com/facebookresearch/detectron2.git git+https://github.com/NVIDIA/TensorRT#subdirectory=tools/onnx-graphsurgeon -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/downloader.py b/samples/python/downloader.py index 5e8be2020..70bc6bd5f 100755 --- a/samples/python/downloader.py +++ b/samples/python/downloader.py @@ -90,9 +90,9 @@ def _downloadFile(path, url): from requests.adapters import HTTPAdapter, Retry session = requests.Session() - retries = Retry(total=5, backoff_factor=0.5) + retries = Retry(total=10, backoff_factor=0.5) session.mount("http://", HTTPAdapter(max_retries=retries)) - r = session.get(url, stream=True, timeout=10) + r = session.get(url, stream=True, timeout=30) size = int(r.headers.get("content-length", 0)) from tqdm import tqdm diff --git a/samples/python/efficientdet/requirements.txt b/samples/python/efficientdet/requirements.txt index e69a02788..4eef5ebd8 100644 --- a/samples/python/efficientdet/requirements.txt +++ b/samples/python/efficientdet/requirements.txt @@ -1,10 +1,12 @@ Pillow>=10.0.0 -onnx==1.16.0 +onnx==1.14.0 onnxruntime==1.15.1 tf2onnx==1.8.1 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/efficientnet/requirements.txt b/samples/python/efficientnet/requirements.txt index 74c92ba5b..4dd8fc5f4 100644 --- a/samples/python/efficientnet/requirements.txt +++ b/samples/python/efficientnet/requirements.txt @@ -1,10 +1,12 @@ Pillow>=10.0.0 -onnx==1.16.0 +onnx==1.14.0 tensorrt>=7.1.0.0 tf2onnx==1.8.1 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/engine_refit_onnx_bidaf/download.yml b/samples/python/engine_refit_onnx_bidaf/download.yml index 0b8fd8965..f44bdaef2 100644 --- a/samples/python/engine_refit_onnx_bidaf/download.yml +++ b/samples/python/engine_refit_onnx_bidaf/download.yml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# sample: engine_refit_onnx_bidaf files: - path: samples/python/engine_refit_onnx_bidaf/bidaf-original.onnx diff --git a/samples/python/engine_refit_onnx_bidaf/requirements.txt b/samples/python/engine_refit_onnx_bidaf/requirements.txt index cdb0f837c..b5009f1c3 100644 --- a/samples/python/engine_refit_onnx_bidaf/requirements.txt +++ b/samples/python/engine_refit_onnx_bidaf/requirements.txt @@ -1,8 +1,10 @@ nltk>=3.5 wget==3.2 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/introductory_parser_samples/requirements.txt b/samples/python/introductory_parser_samples/requirements.txt index eaf6990e6..01b57c060 100644 --- a/samples/python/introductory_parser_samples/requirements.txt +++ b/samples/python/introductory_parser_samples/requirements.txt @@ -1,7 +1,9 @@ Pillow>=10.0.0 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/network_api_pytorch_mnist/requirements.txt b/samples/python/network_api_pytorch_mnist/requirements.txt index a9da4c910..f96acc708 100644 --- a/samples/python/network_api_pytorch_mnist/requirements.txt +++ b/samples/python/network_api_pytorch_mnist/requirements.txt @@ -1,17 +1,15 @@ Pillow>=10.0.0 -f https://download.pytorch.org/whl/torch_stable.html -torch==1.11.0; python_version>="3.8" and python_version<"3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torch==1.11.0+cpu; python_version>="3.8" and python_version<"3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -torch==2.0.0; python_version>="3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torch==2.0.0+cpu; python_version>="3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") +torch==2.0.0; (platform_machine=="aarch64" and sys.platform=="linux") +torch==2.0.0+cpu; ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -f https://download.pytorch.org/whl/torch_stable.html -torchvision==0.12.0; python_version>="3.8" and python_version<"3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torchvision==0.12.0+cpu; python_version>="3.8" and python_version<"3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -torchvision==0.15.1; python_version>="3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torchvision==0.15.1+cpu; python_version>="3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -cuda-python==12.2.0 +torchvision==0.15.1; (platform_machine=="aarch64" and sys.platform=="linux") +torchvision==0.15.1+cpu; ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/non_zero_plugin/requirements.txt b/samples/python/non_zero_plugin/requirements.txt index 3f8c71f0c..ae724c401 100644 --- a/samples/python/non_zero_plugin/requirements.txt +++ b/samples/python/non_zero_plugin/requirements.txt @@ -9,5 +9,5 @@ numpy==1.23.5; platform_system != "Windows" onnx-graphsurgeon pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 diff --git a/samples/python/onnx_custom_plugin/requirements.txt b/samples/python/onnx_custom_plugin/requirements.txt index e713a19ba..9fe2b9a57 100644 --- a/samples/python/onnx_custom_plugin/requirements.txt +++ b/samples/python/onnx_custom_plugin/requirements.txt @@ -3,9 +3,11 @@ onnx==1.16.0 --extra-index-url https://pypi.ngc.nvidia.com onnx-graphsurgeon>=0.3.20 wget>=3.2 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/onnx_packnet/download.yml b/samples/python/onnx_packnet/download.yml index 9763aee11..cc0f3725f 100644 --- a/samples/python/onnx_packnet/download.yml +++ b/samples/python/onnx_packnet/download.yml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# sample: onnx_packnet files: - path: samples/python/onnx_packnet/packnet-sfm-0.1.2.zip diff --git a/samples/python/onnx_packnet/requirements.txt b/samples/python/onnx_packnet/requirements.txt index 84b01b481..e88f8646c 100644 --- a/samples/python/onnx_packnet/requirements.txt +++ b/samples/python/onnx_packnet/requirements.txt @@ -2,16 +2,13 @@ onnx==1.16.0 --extra-index-url https://pypi.ngc.nvidia.com onnx-graphsurgeon>=0.3.20 -f https://download.pytorch.org/whl/torch_stable.html -torch==1.11.0; python_version>="3.8" and python_version<"3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torch==1.11.0+cpu; python_version>="3.8" and python_version<"3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -torch==2.0.0; python_version>="3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torch==2.0.0+cpu; python_version>="3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") +torch==2.0.0; (platform_machine=="aarch64" and sys.platform=="linux") +torch==2.0.0+cpu; ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -f https://download.pytorch.org/whl/torch_stable.html -torchvision==0.12.0; python_version>="3.8" and python_version<"3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torchvision==0.12.0+cpu; python_version>="3.8" and python_version<"3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") -torchvision==0.15.1; python_version>="3.11" and (platform_machine=="aarch64" and sys.platform=="linux") -torchvision==0.15.1+cpu; python_version>="3.11" and ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") +torchvision==0.15.1; (platform_machine=="aarch64" and sys.platform=="linux") +torchvision==0.15.1+cpu; ((platform_machine=="x86_64" and sys.platform=="linux") or sys.platform=="win32") pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/python_plugin/requirements.txt b/samples/python/python_plugin/requirements.txt index 707917393..7cd81a175 100644 --- a/samples/python/python_plugin/requirements.txt +++ b/samples/python/python_plugin/requirements.txt @@ -11,5 +11,5 @@ numpy==1.23.5; platform_system != "Windows" onnx-graphsurgeon pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 diff --git a/samples/python/requirements.txt b/samples/python/requirements.txt index 09413658a..f8d227a8a 100644 --- a/samples/python/requirements.txt +++ b/samples/python/requirements.txt @@ -1,4 +1,5 @@ pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/sample_weight_stripping/notebooks/weight_stripping.ipynb b/samples/python/sample_weight_stripping/notebooks/weight_stripping.ipynb index da5d15bd5..5c6fcb365 100644 --- a/samples/python/sample_weight_stripping/notebooks/weight_stripping.ipynb +++ b/samples/python/sample_weight_stripping/notebooks/weight_stripping.ipynb @@ -1,3 +1,20 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + { "cells": [ { diff --git a/samples/python/sample_weight_stripping/refit_engine_and_infer.py b/samples/python/sample_weight_stripping/refit_engine_and_infer.py index c9021c13d..19d4df617 100644 --- a/samples/python/sample_weight_stripping/refit_engine_and_infer.py +++ b/samples/python/sample_weight_stripping/refit_engine_and_infer.py @@ -37,7 +37,7 @@ class ModelData(object): MODEL_PATH = "ResNet50.onnx" INPUT_SHAPE = (3, 224, 224) # We can convert TensorRT data types to numpy types with trt.nptype() - DTYPE = trt.float16 + DTYPE = trt.float32 def load_stripped_engine_and_refit(input_file, onnx_model_path): runtime = trt.Runtime(TRT_LOGGER) diff --git a/samples/python/sample_weight_stripping/requirements.txt b/samples/python/sample_weight_stripping/requirements.txt index eaf6990e6..01b57c060 100644 --- a/samples/python/sample_weight_stripping/requirements.txt +++ b/samples/python/sample_weight_stripping/requirements.txt @@ -1,7 +1,9 @@ Pillow>=10.0.0 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/simple_progress_monitor/requirements.txt b/samples/python/simple_progress_monitor/requirements.txt index eaf6990e6..01b57c060 100644 --- a/samples/python/simple_progress_monitor/requirements.txt +++ b/samples/python/simple_progress_monitor/requirements.txt @@ -1,7 +1,9 @@ Pillow>=10.0.0 -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/tensorflow_object_detection_api/requirements.txt b/samples/python/tensorflow_object_detection_api/requirements.txt index e7ee8eaf5..b1efdbfb8 100644 --- a/samples/python/tensorflow_object_detection_api/requirements.txt +++ b/samples/python/tensorflow_object_detection_api/requirements.txt @@ -1,13 +1,15 @@ -onnx==1.16.0 +onnx==1.14.0 onnxruntime==1.15.1 Pillow>=10.0.0 tf2onnx==1.15.0 pycocotools; platform_system != "Windows" pycocotools-windows; platform_system == "Windows" -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" Cython<3.0 pyyaml==5.3.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" diff --git a/samples/python/yolov3_onnx/download.yml b/samples/python/yolov3_onnx/download.yml index bd5b12f99..3c404908a 100644 --- a/samples/python/yolov3_onnx/download.yml +++ b/samples/python/yolov3_onnx/download.yml @@ -1,3 +1,19 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# sample: yolov3_onnx files: - path: samples/python/yolov3_onnx/yolov3.cfg diff --git a/samples/python/yolov3_onnx/requirements.txt b/samples/python/yolov3_onnx/requirements.txt index 31bdfbf0a..9a9e9a278 100644 --- a/samples/python/yolov3_onnx/requirements.txt +++ b/samples/python/yolov3_onnx/requirements.txt @@ -1,9 +1,11 @@ -cuda-python==12.2.0 +cuda-python==12.2.0; python_version <= "3.10" +cuda-python==12.5.0; python_version >= "3.11" pywin32; platform_system == "Windows" -numpy==1.24.4 +numpy==1.24.4; python_version <= "3.10" +numpy==1.26.4; python_version >= "3.11" onnx==1.16.0 Pillow>=10.0.0 protobuf==3.20.3 pyyaml==6.0.1 -requests==2.31.0 +requests==2.32.2 tqdm==4.66.4 diff --git a/samples/sampleNonZeroPlugin/sampleNonZeroPlugin.cpp b/samples/sampleNonZeroPlugin/sampleNonZeroPlugin.cpp index 40313f40c..da890bf36 100644 --- a/samples/sampleNonZeroPlugin/sampleNonZeroPlugin.cpp +++ b/samples/sampleNonZeroPlugin/sampleNonZeroPlugin.cpp @@ -226,6 +226,12 @@ class NonZeroPlugin : public IPluginV3, public IPluginV3OneCore, public IPluginV cudaMemsetAsync(outputs[1], 0, sizeof(int32_t), stream); + if (workspace == nullptr) + { + sample::gLogError << "Unsupported: workspace is null" << std::endl; + return -1; + } + if (!mRowOrder) { // When constructing a column major output, the kernel needs to be aware of the total number of non-zero diff --git a/samples/trtexec/trtexec.cpp b/samples/trtexec/trtexec.cpp index 3f1281730..f021ce69a 100644 --- a/samples/trtexec/trtexec.cpp +++ b/samples/trtexec/trtexec.cpp @@ -342,7 +342,13 @@ int main(int argc, char** argv) // dynamicPlugins may have been updated by getEngineBuildEnv above bEnv->engine.setDynamicPlugins(options.system.dynamicPlugins); #endif - if (!options.build.safe && !options.build.buildDLAStandalone && options.build.refittable) + + // When some options are enabled, engine deserialization is not supported on the platform that the engine was + // built. + bool const supportDeserialization = !options.build.safe && !options.build.buildDLAStandalone + && options.build.runtimePlatform == nvinfer1::RuntimePlatform::kSAME_AS_BUILD; + + if (supportDeserialization && options.build.refittable) { auto* engine = bEnv->engine.get(); if (options.reporting.refit) @@ -369,7 +375,7 @@ int main(int argc, char** argv) if (options.build.skipInference) { - if (!options.build.safe && !options.build.buildDLAStandalone) + if (supportDeserialization) { printLayerInfo(options.reporting, bEnv->engine.get(), nullptr); printOptimizationProfileInfo(options.reporting, bEnv->engine.get()); diff --git a/tools/pytorch-quantization/CONTRIBUTING.md b/tools/pytorch-quantization/CONTRIBUTING.md index 3ac008a85..eb67f19e3 100644 --- a/tools/pytorch-quantization/CONTRIBUTING.md +++ b/tools/pytorch-quantization/CONTRIBUTING.md @@ -109,9 +109,9 @@ QUANT_DESC_8BIT_CONV1D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0) QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) QUANT_DESC_8BIT_CONV3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) QUANT_DESC_8BIT_LINEAR_WEIGHT_PER_ROW = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE1D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) +QUANT_DESC_8BIT_CONVTRANSPOSE1D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) +QUANT_DESC_8BIT_CONVTRANSPOSE2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) +QUANT_DESC_8BIT_CONVTRANSPOSE3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) ``` diff --git a/tools/pytorch-quantization/README.md b/tools/pytorch-quantization/README.md index 6ba198fc4..1078c4018 100644 --- a/tools/pytorch-quantization/README.md +++ b/tools/pytorch-quantization/README.md @@ -1,3 +1,5 @@ +**Note: Pytorch Quantization development has transitioned to the [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer). All developers are encouraged to use the TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression. While the Pytorch Quantization code will remain available, it will no longer receive further development.** + # Pytorch Quantization PyTorch-Quantization is a toolkit for training and evaluating PyTorch models with simulated quantization. Quantization can be added to the model automatically, or manually, allowing the model to be tuned for accuracy and performance. Quantization is compatible with NVIDIAs high performance integer kernels which leverage integer Tensor Cores. The quantized model can be exported to ONNX and imported by TensorRT 8.0 and later. @@ -38,6 +40,6 @@ python setup.py install ## Resources -* Pytorch Quantization Toolkit [userguide](https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/userguide.html) +* Pytorch Quantization Toolkit [userguide](https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html) * Quantization Basics [whitepaper](https://arxiv.org/abs/2004.09602) diff --git a/tools/pytorch-quantization/VERSION b/tools/pytorch-quantization/VERSION index ccbccc3dc..c043eea77 100644 --- a/tools/pytorch-quantization/VERSION +++ b/tools/pytorch-quantization/VERSION @@ -1 +1 @@ -2.2.0 +2.2.1 diff --git a/tools/pytorch-quantization/docs/source/utils.rst b/tools/pytorch-quantization/docs/source/utils.rst index 8023f2525..3d8e030e5 100644 --- a/tools/pytorch-quantization/docs/source/utils.rst +++ b/tools/pytorch-quantization/docs/source/utils.rst @@ -4,15 +4,6 @@ pytorch_quantization.utils =================================== - -pytorch_quantization.utils.amp\_wrapper ------------------------------------------- - -.. automodule:: pytorch_quantization.utils.amp_wrapper - :members: - :undoc-members: - :show-inheritance: - pytorch_quantization.utils.quant\_logging -------------------------------------------- @@ -27,4 +18,4 @@ pytorch_quantization.utils.reduce\_amax .. automodule:: pytorch_quantization.utils.reduce_amax :members: :undoc-members: - :show-inheritance: \ No newline at end of file + :show-inheritance: diff --git a/tools/pytorch-quantization/examples/torchvision/classification_flow.py b/tools/pytorch-quantization/examples/torchvision/classification_flow.py index 18c10c665..621e99d3e 100644 --- a/tools/pytorch-quantization/examples/torchvision/classification_flow.py +++ b/tools/pytorch-quantization/examples/torchvision/classification_flow.py @@ -356,7 +356,7 @@ def evaluate_onnx(onnx_filename, data_loader, criterion, print_freq): The method returns the average top-1 accuracy on the given dataset. """ print("Loading ONNX file: ", onnx_filename) - ort_session = onnxruntime.InferenceSession(onnx_filename) + ort_session = onnxruntime.InferenceSession(onnx_filename, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) with torch.no_grad(): metric_logger = utils.MetricLogger(delimiter=" ") header = 'Test:' diff --git a/tools/pytorch-quantization/pytorch_quantization/__init__.py b/tools/pytorch-quantization/pytorch_quantization/__init__.py index 223bd239a..869b2ad87 100644 --- a/tools/pytorch-quantization/pytorch_quantization/__init__.py +++ b/tools/pytorch-quantization/pytorch_quantization/__init__.py @@ -19,4 +19,4 @@ from .version import __version__ from .quant_modules import * -logging.use_absl_handler() +logging.use_absl_handler() \ No newline at end of file diff --git a/tools/pytorch-quantization/pytorch_quantization/calib/histogram.py b/tools/pytorch-quantization/pytorch_quantization/calib/histogram.py index f5ba6d1fd..76732d108 100644 --- a/tools/pytorch-quantization/pytorch_quantization/calib/histogram.py +++ b/tools/pytorch-quantization/pytorch_quantization/calib/histogram.py @@ -136,7 +136,7 @@ def compute_amax(self, method: str, *, stride: int = 1, start_bin: int = 128, pe amax: a tensor """ if isinstance(self._calib_hist, torch.Tensor): - calib_hist = self._calib_hist.int().cpu().numpy() + calib_hist = self._calib_hist.to(torch.int64).cpu().numpy() calib_bin_edges = self._calib_bin_edges.cpu().numpy() else: calib_hist = self._calib_hist @@ -383,3 +383,4 @@ def calibrate_weights(model, method="percentile", perchannel=True, percentile=99 calib_amax_shape[axis] = module.weight.shape[axis] calib_amax = torch.stack(calib_amax).reshape(calib_amax_shape) module.weight_quantizer.amax = calib_amax.detach().cpu().numpy() + diff --git a/tools/pytorch-quantization/pytorch_quantization/nn/modules/tensor_quantizer.py b/tools/pytorch-quantization/pytorch_quantization/nn/modules/tensor_quantizer.py index c985319b8..0179bc209 100644 --- a/tools/pytorch-quantization/pytorch_quantization/nn/modules/tensor_quantizer.py +++ b/tools/pytorch-quantization/pytorch_quantization/nn/modules/tensor_quantizer.py @@ -453,4 +453,4 @@ def _load_from_state_dict(self, state_dict, prefix, *args, **kwargs): elif src_has_pre_quant_scale and dst_has_pre_quant_scale: logging.warning("{}: Overwriting pre_quant_scale.".format(prefix[:-1])) - super(TensorQuantizer, self)._load_from_state_dict(state_dict, prefix, *args, **kwargs) + super(TensorQuantizer, self)._load_from_state_dict(state_dict, prefix, *args, **kwargs) \ No newline at end of file diff --git a/tools/pytorch-quantization/pytorch_quantization/quant_modules.py b/tools/pytorch-quantization/pytorch_quantization/quant_modules.py index b4369b011..0029f87a6 100644 --- a/tools/pytorch-quantization/pytorch_quantization/quant_modules.py +++ b/tools/pytorch-quantization/pytorch_quantization/quant_modules.py @@ -179,4 +179,4 @@ def enable_onnx_export(): quant_nn.TensorQuantizer._enable_onnx_export = True yield - quant_nn.TensorQuantizer._enable_onnx_export = False + quant_nn.TensorQuantizer._enable_onnx_export = False \ No newline at end of file diff --git a/tools/pytorch-quantization/pytorch_quantization/tensor_quant.py b/tools/pytorch-quantization/pytorch_quantization/tensor_quant.py index 0996d4cef..a15b37dc2 100644 --- a/tools/pytorch-quantization/pytorch_quantization/tensor_quant.py +++ b/tools/pytorch-quantization/pytorch_quantization/tensor_quant.py @@ -243,9 +243,9 @@ def from_yaml(cls, yaml_str): QUANT_DESC_8BIT_CONV2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) QUANT_DESC_8BIT_CONV3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) QUANT_DESC_8BIT_LINEAR_WEIGHT_PER_ROW = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE1D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) -QUANT_DESC_8BIT_CONVTRANSPOSE3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(0)) +QUANT_DESC_8BIT_CONVTRANSPOSE1D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) +QUANT_DESC_8BIT_CONVTRANSPOSE2D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) +QUANT_DESC_8BIT_CONVTRANSPOSE3D_WEIGHT_PER_CHANNEL = QuantDescriptor(num_bits=8, axis=(1)) @torch.jit.script diff --git a/tools/pytorch-quantization/tests/quant_conv_transposed_test.py b/tools/pytorch-quantization/tests/quant_conv_transposed_test.py index 8aec73a90..4267bf4c1 100644 --- a/tools/pytorch-quantization/tests/quant_conv_transposed_test.py +++ b/tools/pytorch-quantization/tests/quant_conv_transposed_test.py @@ -100,7 +100,7 @@ def test_weight_fake_quant_per_channel(self): weight_copy = quant_conv_object.weight.clone() - amax = quant_utils.reduce_amax(weight_copy, axis=(1, 2, 3)) + amax = quant_utils.reduce_amax(weight_copy, axis=(0, 2, 3)) quant_weight = tensor_quant.fake_tensor_quant(weight_copy, amax) out1 = F.conv_transpose2d(test_input, quant_weight) @@ -298,7 +298,7 @@ def test_fake_quant_per_channel_bias(self): quant_input = tensor_quant.fake_tensor_quant(test_input, torch.max(torch.abs(test_input))) weight_copy = quant_conv_object.weight.clone() - amax = quant_utils.reduce_amax(weight_copy, axis=(1, 2, 3, 4)) + amax = quant_utils.reduce_amax(weight_copy, axis=(0, 2, 3, 4)) quant_weight = tensor_quant.fake_tensor_quant(weight_copy, amax) out1 = F.conv_transpose3d(quant_input, quant_weight, bias=quant_conv_object.bias) @@ -386,7 +386,7 @@ def test_weight_fake_quant_per_channel(self): weight_copy = quant_conv_object.weight.clone() - amax = quant_utils.reduce_amax(weight_copy, axis=(1, 2)) + amax = quant_utils.reduce_amax(weight_copy, axis=(0, 2)) quant_weight = tensor_quant.fake_tensor_quant(weight_copy, amax) out1 = F.conv_transpose1d(test_input, quant_weight) diff --git a/tools/pytorch-quantization/tests/tensor_quantizer_test.py b/tools/pytorch-quantization/tests/tensor_quantizer_test.py index 84eee277e..a148ab20d 100644 --- a/tools/pytorch-quantization/tests/tensor_quantizer_test.py +++ b/tools/pytorch-quantization/tests/tensor_quantizer_test.py @@ -275,4 +275,4 @@ def test_e4m3(self, E, M, axis): e4m3_x = e4m3_quantizer(x) ref = tensor_quant.scaled_e4m3(x, e4m3_quantizer._get_amax(x), E, M) - test_utils.compare(e4m3_x, ref, atol=0, rtol=0) + test_utils.compare(e4m3_x, ref, atol=0, rtol=0) \ No newline at end of file diff --git a/tools/pytorch-quantization/tests/test_onnx_export.py b/tools/pytorch-quantization/tests/test_onnx_export.py index 1b7983cf1..60af75385 100644 --- a/tools/pytorch-quantization/tests/test_onnx_export.py +++ b/tools/pytorch-quantization/tests/test_onnx_export.py @@ -99,7 +99,7 @@ def test_onnx_export(num_bits, per_channel_quantization, constant_folding, dtype if num_bits == 8 and dtype != torch.bfloat16: if f is not None: f.seek(0) - ort_session = onnxruntime.InferenceSession(f.read() if onnx_file_path is None else onnx_file_path) + ort_session = onnxruntime.InferenceSession(f.read() if onnx_file_path is None else onnx_file_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) ort_result = ort_session.run([], {"input": dummy_input.cpu().numpy()}) ort_result = torch.tensor(ort_result[0]).cuda() torch_result = model(dummy_input) @@ -108,4 +108,4 @@ def test_onnx_export(num_bits, per_channel_quantization, constant_folding, dtype if __name__ == "__main__": test_onnx_export(8, False, False, torch.float16, "/tmp/test_fp16.onnx") - test_onnx_export(8, False, False, torch.bfloat16, "/tmp/test_bf16.onnx") \ No newline at end of file + test_onnx_export(8, False, False, torch.bfloat16, "/tmp/test_bf16.onnx")