Releases: CVCUDA/CV-CUDA
CV-CUDA Release v0.13.0
v0.13.0 Beta
CV-CUDA v0.13.0 includes ManyLinux 2014 compliant wheels alongside the following changes:
Full Changelog: v0.12.0-beta...v0.13.0-beta
New Features
- Added Python wheel generation compliant with ManyLinux 2014 and PyPI standards.
- The multiple Python version wheels are now unified into a single wheel file per CUDA version.
- Included scripts to build two ManyLinux 2014 Docker images (CUDA 11, CUDA 12) for build, and four Ubuntu images (20.04 and 22.04 x CUDA 11, CUDA 12) for testing.
- Python wheels must be built within the ManyLinux 2014 Docker images to guarantee ManyLinux 2014 compliance.
Bug Fixes
- Upgraded pybind11 to version 2.13.6 for improved compatibility and functionality.
- Resolved Python ABI compatibility issues present in previous versions by upgrading pybind11.
Compatibility and Known Limitations
For the full list, see the main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
CV-CUDA Release v0.12.0
v0.12.0-beta
Release Highlights
CV-CUDA v0.12.0 includes the following changes:
New Features
- Increased functional test coverage of color conversions.
- Reintroduced from 24.07: Improved performance of color conversion operators (e.g., 2x faster RGB2YUV).
Bug Fixes
- Fixed bug in YUV(420) conversions: The CvtColor operator incorrectly computed the data location of the second chromaticity channel for conversions.
- Fixed bug in YUV(422) conversions: The CvtColor operator incorrectly interpreted the interleaved YUV(422) data layout as a three-channel tensor.
- Prevent CV_16F alpha addition: some color conversions in the CvtColor operator allowed for the addition of an alpha channel to the destination tensor, which is undefined for the CV_16F data type.
Compatibility and Known Limitations
For the full list, see the main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
CV-CUDA Release v0.11.0
v0.11.0-beta
Release Highlights
CV-CUDA v0.11.0 includes critical bug fixes alongside the following changes:
New Features
- Enable NVCV to be built as a static library
- Improve Python doc generation and structure
Bug Fixes
- Update pybind11 from 2.10.0 to 2.13.1. Fixes rare race conditions with Python garbage collector, adds compatibility with numpy2
Full Changelog: v0.10.1-beta...v0.11.0-beta
Compatibility and Known Limitations
Pre-existing limitations
-
The CvtColor operator incorrectly computes the data location of the second chromaticity channel for conversions that involve YUV(420) semi-planar formats. This issue persists through the current release and we intend to address this bug in CV-CUDA v0.12. We do not recommend using these formats.
- Known affected formats:
- NVCV_COLOR_YUV2RGB_I420
- NVCV_COLOR_RGB2YUV_I420
- NVCV_COLOR_YUV2BGR_I420
- NVCV_COLOR_BGR2YUV_I420
- NVCV_COLOR_YUV2RGBA_I420
- NVCV_COLOR_RGBA2YUV_I420
- NVCV_COLOR_YUV2BGRA_I420
- NVCV_COLOR_BGRA2YUV_I420
- NVCV_COLOR_RGB2YUV_I420
- NVCV_COLOR_YUV2RGB_YV12
- NVCV_COLOR_RGB2YUV_YV12
- NVCV_COLOR_YUV2BGR_YV12
- NVCV_COLOR_BGR2YUV_YV12
- NVCV_COLOR_YUV2RGBA_YV12
- NVCV_COLOR_RGBA2YUV_YV12
- NVCV_COLOR_YUV2BGRA_YV12
- NVCV_COLOR_BGRA2YUV_YV12
- NVCV_COLOR_RGB2YUV_YV12
- NVCV_COLOR_YUV2GRAY_420
- Known affected formats:
For the full list, see the main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
Release v0.10.1-beta
v0.10.1-beta
Release Highlights
CV-CUDA v0.10.1 reverts the OpCvtColor performance improvements introduced in v0.10.0 due to discovered bugs.
These optimizations will be reintroduced, with consolidated testing, in a future release.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- [CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA] https://developer.nvidia.com/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/)
- [NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI] https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/)
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
Release v0.10.0-beta
v0.10.0-beta
UPDATE 08/08:
With CV-CUDA v0.10.0-beta release, we introduced a bug in the optimizations of color conversions (the R and B channels are swapped in 'YUV to BGR' and 'BGR to YUV', see for instance cvt_color.cu).
We recommend using CV-CUDA v0.10.1-beta that reverts these optimizations. These optimizations will be reintroduced, with consolidated testing, in a future release.
Release Highlights
CV-CUDA v0.10.0 includes a critical bug fix (cache growth management) alongside the following changes:
- New Features:
- Added mechanism to limit and manage cache memory consumption (includes new "Best Practices" documentation).
- Performance improvements of color conversion operators (e.g., 2x faster RGB2YUV). Known bug, see issue
- Refactored codebase to allow independent build of NVCV library (data structures).
- Bug Fixes:
- Fixed unbounded cache memory consumption issue.
- Improved management of Python-created object lifetimes, decoupled from cache management.
- Fixed potential crash in Resize operator's linear and nearest neighbor interpolation from non-aligned vectorized writes.
- Fixed Python CvtColor operator to correctly handle NV12 and NV21 outputs.
- Fixed Resize and RandomResizedCrop linear interpolation weight for border rows and columns.
- Fixed missing parameter in C API for fused ResizeCropConvertReformat.
- Fixed several minor documentation and error output issues.
- Fixed minor compiler warning while building Resize operator.
Compatibility and Known Limitations
- New limitations:
- Cache/resource management introduced in v0.10 add micro-second-level overhead to Python operator calls. Based on the performance analysis of our Python samples, we expect the production- and pipeline-level impact to be negligible. CUDA kernel and C++ call performance is not affected. We aim to investigate and reduce this overhead further in a future release.
- Sporadic Pybind11-deallocation crashes have been reported in long-lasting multi-threaded Python pipelines with externally allocated memory (eg wrapped Pytorch buffers). We are evaluating an upgrade of Pybind11 (currently using 2.10) as a potential fix in an upcoming release.
- Erroneous BGR -> YUV and YUV -> BGR color conversions, see issue
For the full list, see main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- [CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA] https://developer.nvidia.com/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/)
- [NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI] https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/)
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
Release v0.9.0-beta
v0.9.0-beta
Release Highlights
CV-CUDA v0.9.0 includes the following changes:
-
New Features:
-
Improved Resize performance (up to 4x for u8 inputs, up to 3x for RGB8)
-
Improved performance of cubic interpolation, eg in Rotate, WarpAffine and WarpPerspective (up to 2x faster)
-
Added optional scaling to ResizeCropConvertReformat fused operator
-
Improved structure of Python documentation and optimized its generation (>5min to <30s) by removing the Exhale index
-
Added 64bit stride support to various operators
- limited to 32bit strides to avoid performance regressions: AdaptiveThreshold, AdvCvtColor, AverageBlur, BilateralFilter, BrightnessContrast, ColorTwist, BoxBlur, CenterCrop, ConvertTo, CopyMakeBorder, CustomCrop, GaussianNoise, Gaussian, Flip, HistogramEq, JointBilateralFilter, Laplacian, Morphology, Normalize, RandomResizedCrop, Reformat, Remap, Resize, Rotate, SIFT, WarpAffine, WarpPerspective
-
-
Bug Fixes:
- Added exception handling on CApi in Python: now forward C/C++exceptions to Python
- Fixed coordinate rounding bug in Resize operator with nearest neighbor interpolation
Compatibility and Known Limitations
- Documentation built on Ubuntu 20.04 needs an up-to-date version of sphinx (
pip install --upgrade sphinx
) as well as explicitly parsing the system's default python version./ci/build_docs path/to/build -DPYTHON_VERSIONS="<py_ver>"
. - Python bindings installed via Debian packages and Python tests fail with Numpy 2.0. We recommend using an older version of Numpy (e.g. 1.26) until we have implemented a fix.
- The Resize and RandomResizedCrop operators incorrectly interpolate pixel values near the boundary of an image or tensor when using linear and cubic interpolation. This will be fixed in an upcoming release.
See main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
CV-CUDA Release v0.8.0
Release Highlights
CV-CUDA v0.8.0 includes the following changes:
-
New Operator:
- Introduced fused 'ResizeCropConvertReformat' operator
-
New Features:
- Improved initialization of Image and ImageBatchVarShape: enabled efficient cache reuse
- Improved benchmarking utilities: added throughput computation and power/clock monitoring
- Added tests to Resize, BilateralFilter, CvtColor, Erase, JointBilateralFilter, PillowResize
-
Bug Fixes:
- Fixed potential crash when using custom streams
- Switched PairwiseMatcher to use L2-norm as default
- Fixed documentation of CropFlipNormalizeReformat
Compatibility and Known Limitations
See main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
Full Changelog: v0.7.0-beta...v0.8.0-beta
CV-CUDA Release v0.7.0
CV-CUDA v0.7.0 Release Notes
CV-CUDA 0.7.0 introduces performance and support enhancements, along with bug fixes and new features.
Full Changelog: v0.6.0-beta...v0.7.0-beta
Release Highlights
CV-CUDA v0.7.0 includes the following improvements:
New Features:
- Optimized Python bindings: near-zero overhead compared to C++ calls
- Added masking option to Label operator: conditional island removal
- Added IGX Orin support (with dGPU, Ampere or Ada RTX6000)
- Added support of signed 32bits output datatype for Label operator
Removed Operator:
- Removed Find Contours operator for troubleshooting of major limitations
Bug Fixes:
- Fixed constraint on installation directory for Python tests: tar test packages can now be used from any directory
Compatibility and Known Limitations
See main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
CV-CUDA Release v0.6.0
CV-CUDA 0.6.0 Release Notes
CV-CUDA 0.6.0 is a comprehensive update introducing new packaging and documentation enhancements, along with bug fixes and new features.
Release Highlights
CV-CUDA v0.6.0 includes significant improvements:
New Operator:
- HQResize: Advanced resize operator supporting 2D and 3D data, tensors, tensor batches, and varshape image batches (2D only). Supports nearest neighbor, linear, cubic, Gaussian and Lanczos interpolation, with optional antialiasing when down-sampling.
New Features:
-
Standalone Python Wheels, including tooling and documentation to generate them. Prebuilt binaries for selected configurations.
-
Homogenized package naming
-
Improved documentation of hardware/software compatibility, build and test tutorials
-
Added Python Operator benchmarking application
-
Samples updated to new codec libraries, PyNvVideoCodec and NvImageCodec
-
Support of rank 2 tensors in MedianBlur
-
Additional tests for various operators
Bug Fixes:
-
Fix name clashes with NVTX
-
Fix workspace memory allocation of complex filters
-
Fix memory fault in MinAreaRect
Compatibility and Known Limitations
See main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.
CV-CUDA Release v0.5.0
CV-CUDA 0.5.0 Release Notes
CV-CUDA 0.5.0 is a major release of the library providing multiple new operators, features, and fixes to multiple customer-reported issues.
Release Highlights
CV-CUDA v0.5.0 includes the following key changes:
-
New Operators:
- FindHomography: Calculates a perspective transform from four pairs of the corresponding points
- Label: Labels connected regions in an image using 4-way connectivity for foreground and 8-way for background pixels
- PairwiseMatcher: Matches features computed separately (e.g. via the SIFT operator) in two images using the brute force method
- Stack: Concatenates two input tensors into a single output tensor
-
New Features:
- Added
TensorBatch
in C++ and Python, a container type that can hold a list of non-uniformly shaped tensors - Added
Workspace
in C++ and Python, an abstraction of memory and asynchronous resources for CV-CUDA operators - Added better color format support in nvcv_types
- New sample application for the
Label
operator - JetPack 5.1.2 support for L4T (Jetson Orin, L4T 35.4.1, CUDA 11.4)
- Enhanced documentation
- Added
-
Bug Fixes:
- Resolved memory leak in
NvBlurBoxes
- Fixed segmentation fault issue in Python with certain imports
- Corrected
typestr
format issue in__cuda_array_interface__
- Addressed occasional hanging in
OpBoxBlur
on RGBA images
- Resolved memory leak in
Compatibility
- GPU Compute Capability: 7+.x
- Ubuntu x86_64: 20.04, 22.04
- CUDA Toolkit: 11.7+ (11.2+ for library build and run)
- L4T: 35.4.1, JetPack 5.1.2 aarch64
- GCC: 11.0+ (9.x and 10.x for APIs with pre-built binary)
- Python: 3.8, 3.10
Known Issues/Limitations
- For GCC versions lower than 11.0, C++17 support needs to be enabled when compiling CV-CUDA.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
- CV-CUDA GitHub
- CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
- NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI
- CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.