Skip to content

Releases: CVCUDA/CV-CUDA

CV-CUDA Release v0.13.0

05 Dec 22:41
2a73733
Compare
Choose a tag to compare

v0.13.0 Beta

CV-CUDA v0.13.0 includes ManyLinux 2014 compliant wheels alongside the following changes:

Full Changelog: v0.12.0-beta...v0.13.0-beta

New Features

  • Added Python wheel generation compliant with ManyLinux 2014 and PyPI standards.
    • The multiple Python version wheels are now unified into a single wheel file per CUDA version.
    • Included scripts to build two ManyLinux 2014 Docker images (CUDA 11, CUDA 12) for build, and four Ubuntu images (20.04 and 22.04 x CUDA 11, CUDA 12) for testing.
    • Python wheels must be built within the ManyLinux 2014 Docker images to guarantee ManyLinux 2014 compliance.

Bug Fixes

  • Upgraded pybind11 to version 2.13.6 for improved compatibility and functionality.
    • Resolved Python ABI compatibility issues present in previous versions by upgrading pybind11.

Compatibility and Known Limitations

For the full list, see the main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

CV-CUDA Release v0.12.0

27 Sep 00:23
07d5e44
Compare
Choose a tag to compare

v0.12.0-beta

Release Highlights

CV-CUDA v0.12.0 includes the following changes:

New Features​

  • Increased functional test coverage of color conversions. ​
  • Reintroduced from 24.07: Improved performance of color conversion operators (e.g., 2x faster RGB2YUV).

Bug Fixes

  • Fixed bug in YUV(420) conversions: The CvtColor operator incorrectly computed the data location of the second chromaticity channel for conversions.​
  • Fixed bug in YUV(422) conversions: The CvtColor operator incorrectly interpreted the interleaved YUV(422) data layout as a three-channel tensor.​
  • Prevent CV_16F alpha addition: some color conversions in the CvtColor operator allowed for the addition of an alpha channel to the destination tensor, which is undefined for the CV_16F data type.

Compatibility and Known Limitations

For the full list, see the main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

CV-CUDA Release v0.11.0

05 Sep 00:13
84e3dcd
Compare
Choose a tag to compare

v0.11.0-beta

Release Highlights

CV-CUDA v0.11.0 includes critical bug fixes alongside the following changes:

New Features

  • Enable NVCV to be built as a static library
  • Improve Python doc generation and structure

Bug Fixes

  • Update pybind11 from 2.10.0 to 2.13.1. Fixes rare race conditions with Python garbage collector, adds compatibility with numpy2

Full Changelog: v0.10.1-beta...v0.11.0-beta

Compatibility and Known Limitations

Pre-existing limitations

  • The CvtColor operator incorrectly computes the data location of the second chromaticity channel for conversions that involve YUV(420) semi-planar formats. This issue persists through the current release and we intend to address this bug in CV-CUDA v0.12. We do not recommend using these formats.

    • Known affected formats:
      • NVCV_COLOR_YUV2RGB_I420
      • NVCV_COLOR_RGB2YUV_I420
      • NVCV_COLOR_YUV2BGR_I420
      • NVCV_COLOR_BGR2YUV_I420
      • NVCV_COLOR_YUV2RGBA_I420
      • NVCV_COLOR_RGBA2YUV_I420
      • NVCV_COLOR_YUV2BGRA_I420
      • NVCV_COLOR_BGRA2YUV_I420
      • NVCV_COLOR_RGB2YUV_I420
      • NVCV_COLOR_YUV2RGB_YV12
      • NVCV_COLOR_RGB2YUV_YV12
      • NVCV_COLOR_YUV2BGR_YV12
      • NVCV_COLOR_BGR2YUV_YV12
      • NVCV_COLOR_YUV2RGBA_YV12
      • NVCV_COLOR_RGBA2YUV_YV12
      • NVCV_COLOR_YUV2BGRA_YV12
      • NVCV_COLOR_BGRA2YUV_YV12
      • NVCV_COLOR_RGB2YUV_YV12
      • NVCV_COLOR_YUV2GRAY_420

For the full list, see the main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

Release v0.10.1-beta

09 Aug 00:07
f769fe4
Compare
Choose a tag to compare

v0.10.1-beta

Release Highlights

CV-CUDA v0.10.1 reverts the OpCvtColor performance improvements introduced in v0.10.0 due to discovered bugs.
These optimizations will be reintroduced, with consolidated testing, in a future release.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. [CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA] https://developer.nvidia.com/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/)
  3. [NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI] https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/)
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

Release v0.10.0-beta

01 Aug 22:18
669197a
Compare
Choose a tag to compare

v0.10.0-beta

UPDATE 08/08:

With CV-CUDA v0.10.0-beta release, we introduced a bug in the optimizations of color conversions (the R and B channels are swapped in 'YUV to BGR' and 'BGR to YUV', see for instance cvt_color.cu).

We recommend using CV-CUDA v0.10.1-beta that reverts these optimizations. These optimizations will be reintroduced, with consolidated testing, in a future release.

Release Highlights

CV-CUDA v0.10.0 includes a critical bug fix (cache growth management) alongside the following changes:

  • New Features:
    • Added mechanism to limit and manage cache memory consumption (includes new "Best Practices" documentation).
    • Performance improvements of color conversion operators (e.g., 2x faster RGB2YUV). Known bug, see issue
    • Refactored codebase to allow independent build of NVCV library (data structures).
  • Bug Fixes:
    • Fixed unbounded cache memory consumption issue.
    • Improved management of Python-created object lifetimes, decoupled from cache management.
    • Fixed potential crash in Resize operator's linear and nearest neighbor interpolation from non-aligned vectorized writes.
    • Fixed Python CvtColor operator to correctly handle NV12 and NV21 outputs.
    • Fixed Resize and RandomResizedCrop linear interpolation weight for border rows and columns.
    • Fixed missing parameter in C API for fused ResizeCropConvertReformat.
    • Fixed several minor documentation and error output issues.
    • Fixed minor compiler warning while building Resize operator.

Compatibility and Known Limitations

  • New limitations:
    • Cache/resource management introduced in v0.10 add micro-second-level overhead to Python operator calls. Based on the performance analysis of our Python samples, we expect the production- and pipeline-level impact to be negligible. CUDA kernel and C++ call performance is not affected. We aim to investigate and reduce this overhead further in a future release.​
    • Sporadic Pybind11-deallocation crashes have been reported in long-lasting multi-threaded Python pipelines with externally allocated memory (eg wrapped Pytorch buffers). We are evaluating an upgrade of Pybind11 (currently using 2.10) as a potential fix in an upcoming release.
    • Erroneous BGR -> YUV and YUV -> BGR color conversions, see issue

For the full list, see main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. [CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA] https://developer.nvidia.com/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/)
  3. [NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI] https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/)
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

Release v0.9.0-beta

28 Jun 22:16
4d89620
Compare
Choose a tag to compare

v0.9.0-beta

Release Highlights

CV-CUDA v0.9.0 includes the following changes:

  • New Features:

    • Improved Resize performance (up to 4x for u8 inputs, up to 3x for RGB8)

    • Improved performance of cubic interpolation, eg in Rotate, WarpAffine and WarpPerspective (up to 2x faster)

    • Added optional scaling to ResizeCropConvertReformat fused operator

    • Improved structure of Python documentation and optimized its generation (>5min to <30s) by removing the Exhale index

    • Added 64bit stride support to various operators

      • limited to 32bit strides to avoid performance regressions: AdaptiveThreshold, AdvCvtColor, AverageBlur, BilateralFilter, BrightnessContrast, ColorTwist, BoxBlur, CenterCrop, ConvertTo, CopyMakeBorder, CustomCrop, GaussianNoise, Gaussian, Flip, HistogramEq, JointBilateralFilter, Laplacian, Morphology, Normalize, RandomResizedCrop, Reformat, Remap, Resize, Rotate, SIFT, WarpAffine, WarpPerspective
  • Bug Fixes:

    • Added exception handling on CApi in Python: now forward C/C++exceptions to Python
    • Fixed coordinate rounding bug in Resize operator with nearest neighbor interpolation

Compatibility and Known Limitations

  • Documentation built on Ubuntu 20.04 needs an up-to-date version of sphinx (pip install --upgrade sphinx) as well as explicitly parsing the system's default python version ./ci/build_docs path/to/build -DPYTHON_VERSIONS="<py_ver>".
  • Python bindings installed via Debian packages and Python tests fail with Numpy 2.0. We recommend using an older version of Numpy (e.g. 1.26) until we have implemented a fix.
  • The Resize and RandomResizedCrop operators incorrectly interpolate pixel values near the boundary of an image or tensor when using linear and cubic interpolation. This will be fixed in an upcoming release.

See main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

CV-CUDA Release v0.8.0

31 May 23:18
b3bd172
Compare
Choose a tag to compare

Release Highlights

CV-CUDA v0.8.0 includes the following changes:

  • New Operator:

    • Introduced fused 'ResizeCropConvertReformat' operator
  • New Features:

    • Improved initialization of Image and ImageBatchVarShape: enabled efficient cache reuse
    • Improved benchmarking utilities: added throughput computation and power/clock monitoring
    • Added tests to Resize, BilateralFilter, CvtColor, Erase, JointBilateralFilter, PillowResize
  • Bug Fixes:

    • Fixed potential crash when using custom streams
    • Switched PairwiseMatcher to use L2-norm as default
    • Fixed documentation of CropFlipNormalizeReformat

Compatibility and Known Limitations

See main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

Full Changelog: v0.7.0-beta...v0.8.0-beta

CV-CUDA Release v0.7.0

26 Apr 22:53
11d40a4
Compare
Choose a tag to compare

CV-CUDA v0.7.0 Release Notes

CV-CUDA 0.7.0 introduces performance and support enhancements, along with bug fixes and new features.

Full Changelog: v0.6.0-beta...v0.7.0-beta

Release Highlights

CV-CUDA v0.7.0 includes the following improvements:

New Features:

  • Optimized Python bindings: near-zero overhead compared to C++ calls​
  • Added masking option to Label operator: conditional island removal
  • Added IGX Orin support (with dGPU, Ampere or Ada RTX6000)​
  • Added support of signed 32bits output datatype for Label operator​

Removed Operator:​

  • Removed Find Contours operator for troubleshooting of major limitations

Bug Fixes:

  • Fixed constraint on installation directory for Python tests​: tar test packages can now be used from any directory​

Compatibility and Known Limitations

See main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

CV-CUDA Release v0.6.0

15 Mar 18:39
0c6dde3
Compare
Choose a tag to compare

CV-CUDA 0.6.0 Release Notes

CV-CUDA 0.6.0 is a comprehensive update introducing new packaging and documentation enhancements, along with bug fixes and new features.

Release Highlights

CV-CUDA v0.6.0 includes significant improvements:

New Operator:

  • HQResize: Advanced resize operator supporting 2D and 3D data, tensors, tensor batches, and varshape image batches (2D only). Supports nearest neighbor, linear, cubic, Gaussian and Lanczos interpolation, with optional antialiasing when down-sampling.

New Features:

  • Standalone Python Wheels, including tooling and documentation to generate them. Prebuilt binaries for selected configurations.

  • Homogenized package naming

  • Improved documentation of hardware/software compatibility, build and test tutorials

  • Added Python Operator benchmarking application

  • Samples updated to new codec libraries, PyNvVideoCodec and NvImageCodec

  • Support of rank 2 tensors in MedianBlur

  • Additional tests for various operators

Bug Fixes:

  • Fix name clashes with NVTX

  • Fix workspace memory allocation of complex filters

  • Fix memory fault in MinAreaRect

Compatibility and Known Limitations

See main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.

CV-CUDA Release v0.5.0

15 Dec 19:29
6b35015
Compare
Choose a tag to compare

CV-CUDA 0.5.0 Release Notes

CV-CUDA 0.5.0 is a major release of the library providing multiple new operators, features, and fixes to multiple customer-reported issues.

Release Highlights

CV-CUDA v0.5.0 includes the following key changes:

  • New Operators:

    • FindHomography: Calculates a perspective transform from four pairs of the corresponding points
    • Label: Labels connected regions in an image using 4-way connectivity for foreground and 8-way for background pixels
    • PairwiseMatcher: Matches features computed separately (e.g. via the SIFT operator) in two images using the brute force method
    • Stack: Concatenates two input tensors into a single output tensor
  • New Features:

    • Added TensorBatch in C++ and Python, a container type that can hold a list of non-uniformly shaped tensors
    • Added Workspace in C++ and Python, an abstraction of memory and asynchronous resources for CV-CUDA operators
    • Added better color format support in nvcv_types
    • New sample application for the Label operator
    • JetPack 5.1.2 support for L4T (Jetson Orin, L4T 35.4.1, CUDA 11.4)
    • Enhanced documentation
  • Bug Fixes:

    • Resolved memory leak in NvBlurBoxes
    • Fixed segmentation fault issue in Python with certain imports
    • Corrected typestr format issue in __cuda_array_interface__
    • Addressed occasional hanging in OpBoxBlur on RGBA images

Compatibility

  • GPU Compute Capability: 7+.x
  • Ubuntu x86_64: 20.04, 22.04
  • CUDA Toolkit: 11.7+ (11.2+ for library build and run)
  • L4T: 35.4.1, JetPack 5.1.2 aarch64
  • GCC: 11.0+ (9.x and 10.x for APIs with pre-built binary)
  • Python: 3.8, 3.10

Known Issues/Limitations

  • For GCC versions lower than 11.0, C++17 support needs to be enabled when compiling CV-CUDA.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
  3. NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.