Skip to content

Commit

Permalink
Merge pull request #4062 from NVIDIA/dev-brb-update-for-10.3-GA
Browse files Browse the repository at this point in the history
Release 10.3-GA
  • Loading branch information
brb-nv authored Aug 8, 2024
2 parents 4575799 + 84dd6ed commit c5b9de3
Show file tree
Hide file tree
Showing 80 changed files with 2,701 additions and 533 deletions.
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ SpacesInContainerLiterals: true
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp11
StatementMacros: [API_ENTRY_TRY]
StatementMacros: [API_ENTRY_TRY,TRT_TRY]
TabWidth: 4
UseTab: Never
...
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,22 @@
# TensorRT OSS Release Changelog

## 10.2.0 GA - 2024-07-10
## 10.3.0 GA - 2024-08-07

Key Features and Updates:

- Demo changes
- Added [Stable Video Diffusion](demo/Diffusion)(`SVD`) pipeline.
- Plugin changes
- Deprecated Version 1 of [ScatterElements plugin](plugin/scatterElementsPlugin). It is superseded by Version 2, which implements the `IPluginV3` interface.
- Quickstart guide
- Updated the [SemanticSegmentation](quickstart/SemanticSegmentation) guide with latest APIs.
- Parser changes
- Added support for tensor `axes` inputs for `Slice` node.
- Updated `ScatterElements` importer to use Version 2 of [ScatterElements plugin](plugin/scatterElementsPlugin), which implements the `IPluginV3` interface.
- Updated tooling
- Polygraphy v0.49.13

## 10.2.0 GA - 2024-07-09

Key Features and Updates:

Expand Down
21 changes: 20 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -337,10 +337,11 @@
limitations under the License.

> demo/Diffusion/utilities.py
> demo/Diffusion/stable_video_diffusion_pipeline.py

HuggingFace diffusers library.

Copyright 2022 The HuggingFace Team.
Copyright 2024 The HuggingFace Team.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -380,3 +381,21 @@
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

> demo/Diffusion/utilities.py

ModelScope library.

Copyright (c) Alibaba, Inc. and its affiliates.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT GA build**
* TensorRT v10.2.0.19
* TensorRT v10.3.0.26
* Available from direct download links listed below

**System Packages**
Expand Down Expand Up @@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.

Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
- [TensorRT 10.2.0.19 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.2.0.19 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz)
- [TensorRT 10.2.0.19 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.2.0.19 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip)
- [TensorRT 10.3.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.3.0.26 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz)
- [TensorRT 10.3.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.3.0.26 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip)


**Example: Ubuntu 20.04 on x86-64 with cuda-12.5**

```bash
cd ~/Downloads
tar -xvzf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.2.0.19
tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
```

**Example: Windows on x86-64 with cuda-12.5**

```powershell
Expand-Archive -Path TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.2.0.19\lib"
Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
```

## Setting Up The Build Environment
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
10.2.0.19
10.3.0.26
2 changes: 1 addition & 1 deletion demo/BERT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ The following software version configuration has been tested:
|Software|Version|
|--------|-------|
|Python|>=3.8|
|TensorRT|10.2.0.19|
|TensorRT|10.3.0.26|
|CUDA|12.5|

## Setup
Expand Down
26 changes: 24 additions & 2 deletions demo/Diffusion/README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ onnx 1.15.0
onnx-graphsurgeon 0.5.2
onnxruntime 1.16.3
polygraphy 0.49.9
tensorrt 10.2.0.19
tensorrt 10.3.0.26
tokenizers 0.13.3
torch 2.2.0
transformers 4.33.1
controlnet-aux 0.0.6
nvidia-modelopt 0.11.2
```
> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading.
> NOTE: optionally install HuggingFace [accelerate](https://pypi.org/project/accelerate/) package for faster and less memory-intense model loading. Note that installing accelerate is known to cause failures while running certain pipelines in Torch Compile mode ([known issue](https://github.com/huggingface/diffusers/issues/9091))
# Running demoDiffusion

Expand Down Expand Up @@ -178,6 +178,28 @@ python3 demo_txt2img_sd3.py "dog wearing a sweater and a blue collar" --version

Note that a denosing-percentage is applied to the number of denoising-steps when an input image conditioning is provided. Its default value is set to 0.6. This parameter can be updated using `--denoising-percentage`

### Image-to-video using SVD (Stable Video Diffusion)

Download the pre-exported ONNX model

```bash
git lfs install
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1-tensorrt onnx-svd-xt-1-1
cd onnx-svd-xt-1-1 && git lfs pull && cd ..
```

SVD-XT-1.1 (25 frames at resolution 576x1024)
```bash
python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN
```

You may also specify a custom conditioning image using `--input-image`:
```bash
python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --input-image https://www.hdcarwallpapers.com/walls/2018_chevrolet_camaro_zl1_nascar_race_car_2-HD.jpg --hf-token=$HF_TOKEN
```

NOTE: The min and max guidance scales are configured using --min-guidance-scale and --max-guidance-scale respectively.

## Configuration options
- Noise scheduler can be set using `--scheduler <scheduler>`. Note: not all schedulers are available for every version.
- To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.
Expand Down
117 changes: 117 additions & 0 deletions demo/Diffusion/demo_img2vid.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import argparse

from PIL import Image

from stable_video_diffusion_pipeline import StableVideoDiffusionPipeline
from utilities import (
PIPELINE_TYPE,
add_arguments,
download_image,
)

def parseArgs():
parser = argparse.ArgumentParser(description="Options for Stable Diffusion Img2Vid Demo", conflict_handler='resolve')
parser = add_arguments(parser)
parser.add_argument('--version', type=str, default="svd-xt-1.1", choices=["svd-xt-1.1"], help="Version of Stable Video Diffusion")
parser.add_argument('--input-image', type=str, default="", help="Path to the input image")
parser.add_argument('--height', type=int, default=576, help="Height of image to generate (must be multiple of 8)")
parser.add_argument('--width', type=int, default=1024, help="Width of image to generate (must be multiple of 8)")
parser.add_argument('--min-guidance-scale', type=float, default=1.0, help="The minimum guidance scale. Used for the classifier free guidance with first frame")
parser.add_argument('--max-guidance-scale', type=float, default=3.0, help="The maximum guidance scale. Used for the classifier free guidance with last frame")
parser.add_argument('--denoising-steps', type=int, default=25, help="Number of denoising steps")
parser.add_argument('--num-warmup-runs', type=int, default=1, help="Number of warmup runs before benchmarking performance")
return parser.parse_args()

def process_pipeline_args(args):

if not args.input_image:
args.input_image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true"
if isinstance(args.input_image, str):
input_image = download_image(args.input_image).resize((args.width, args.height))
elif isinstance(args.input_image, Image.Image):
input_image = Image.open(args.input_image)
else:
raise ValueError(f"Input image(s) must be of type `PIL.Image.Image` or `str` (URL) but is {type(args.input_image)}")

if args.height % 8 != 0 or args.width % 8 != 0:
raise ValueError(f"Image height and width have to be divisible by 8 but are: {args.image_height} and {args.width}.")

# TODO enable BS>1
max_batch_size = 1
args.build_static_batch = True

if args.batch_size > max_batch_size:
raise ValueError(f"Batch size {args.batch_size} is larger than allowed {max_batch_size}.")

if not args.build_static_batch or args.build_dynamic_shape:
raise ValueError(f"Dynamic shapes not supported. Do not specify `--build-dynamic-shape`")

kwargs_init_pipeline = {
'version': args.version,
'max_batch_size': max_batch_size,
'denoising_steps': args.denoising_steps,
'scheduler': args.scheduler,
'min_guidance_scale': args.min_guidance_scale,
'max_guidance_scale': args.max_guidance_scale,
'output_dir': args.output_dir,
'hf_token': args.hf_token,
'verbose': args.verbose,
'nvtx_profile': args.nvtx_profile,
'use_cuda_graph': args.use_cuda_graph,
'framework_model_dir': args.framework_model_dir,
'torch_inference': args.torch_inference,
}

kwargs_load_engine = {
'onnx_opset': args.onnx_opset,
'opt_batch_size': args.batch_size,
'opt_image_height': args.height,
'opt_image_width': args.width,
'static_batch': args.build_static_batch,
'static_shape': not args.build_dynamic_shape,
'enable_all_tactics': args.build_all_tactics,
'enable_refit': args.build_enable_refit,
'timing_cache': args.timing_cache,
}

args_run_demo = (input_image, args.height, args.width, args.batch_size, args.batch_count, args.num_warmup_runs, args.use_cuda_graph)

return kwargs_init_pipeline, kwargs_load_engine, args_run_demo

if __name__ == "__main__":
print("[I] Initializing StableDiffusion img2vid demo using TensorRT")
args = parseArgs()
kwargs_init_pipeline, kwargs_load_engine, args_run_demo = process_pipeline_args(args)

# Initialize demo
demo = StableVideoDiffusionPipeline(
pipeline_type=PIPELINE_TYPE.IMG2VID,
**kwargs_init_pipeline)
demo.loadEngines(
args.engine_dir,
args.framework_model_dir,
args.onnx_dir,
**kwargs_load_engine)
demo.loadResources(args.height, args.width, args.batch_size, args.seed)

# Run inference
demo.run(*args_run_demo)

demo.teardown()
Loading

0 comments on commit c5b9de3

Please sign in to comment.