Releases: uxlfoundation/oneDNN
v2.3.1
This is a patch release containing the following changes to v2.3:
- Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support (f5c071b)
- Fixed integer overflow for inner product implementation on CPUs (66971b5)
- Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 (4e81df0)
- Fixed correctness issue for depthwise convolution post-op with non-default scales on CPUs (783e1d6, 066c832)
- Fixed crash for s8 binary primitive on Windows (d9fd397)
- Fixed performance regression in fp32 to u8 reorder for Intel AMX specific memory formats (97f40cf, 532648a)
- Fixed correctness issue for bfloat16 convolution weight gradient on processors with Intel AMX support (053406d, 6649b75)
- Fixed correctness issue for bfloat16 inner product backpropagation on processors with Intel AMX support (a2e6c55)
- Fixed correctness issue for bfloat16 convolution with padded memory formats on GEN9 GPUs (c0aea07)
- Fixed correctness issue for int8 matmul primitive with zero points on processors with Intel AMX support (55cb716)
- Fixed segfault in depthwise convolution post-op on CPUs (ad46635)
v2.3
Performance Optimizations
- Extended primitive cache to improve primitive descriptor creation performance.
- Improved primitive cache performance in multithreaded configurations.
- Intel Architecture Processors
- Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
- Improved performance of reduction primitive.
- Improved performance of depthwise convolution primitive with NHWC activations for training cases
- Intel Graphics Products
- Improved fp32 and fp16 Winograd convolution performance.
- Introduced support for automatic selection between direct and Winograd convolution algorithms.
- Improved int8 depthwise convolution performance.
- Improved performance of reorder, shuffle, concat, binary, and batch normalization primitives
- Improved layer normalization performance for blocked formats.
- AArch64-based Processors
- Improved reorder primitive performance for systems with SVE 128 and SVE 256 support.
- Improved eltwise primitive performance for systems with SVE 512 support.
Functionality
- Extended batch normalization and layer normalization primitives API to take separate scale and shift arguments.
- Extended resampling primitive with post-ops support and mixed source and destination data types.
Usability
- Introduced binary distribution in conda-forge. Supported configurations cover Linux, Windows, and macOS operating systems and Intel64/AMD64, Aarch64, and PPC64 architectures.
- Introduced support for GPU-only build. This configuration helps to reduce binary footprint for applications targeting GPU.
- Introduced an option to use GNU OpenMP as CPU runtime for DPC++ configuration.
- Introduced verbose log converter. This tool processes oneDNN verbose logs and generates test cases for benchdnn.
Breaking Changes
- Updated minimal supported CMake version from to 2.8.12 (was 2.8.11).
- Updated minimal supported ACL version from 21.05 (was 21.02).
Thanks to the Contributors
This release contains contributions from the project core team as well as Alexandre Truong @aletru01, Arthur Mitrano @aaraujom, fitchbe @fitchbe, Isuru Fernando @isuruf, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, leizheng1 @leizheng1, Nomoto Kazuhiro @NomotoKazuhiro, Peter Caday @petercad, Pablo Romero @pablocum, Takumi-H @Takumi-Honda, Uwe L. Korn @xhochy, Vasily Rubtsov @vasilyru. We would also like to thank everyone who asked questions and reported issues.
v2.3-rc2
This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.
v2.2.4
v2.3-rc
This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.
v2.2.3
This is a patch release containing the following changes to v2.2.2:
- Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support (8a784c6, f0e4af9)
- Fixed correctness issue for PReLU primitive on Intel Processor Graphics (f3c3daf)
- Fixed corretness issue in reorder for blocked layouts with zero padding (68f05d0, d51616b, fd2c642)
- Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support (23b2ec0, 10f8187, 4c0819c)
- Added
-fp-model=precisebuild flag for DPC++ code (3e40e5e) - Fixed potential memory leak in matmul primitive (36dba73)
- Fixed performance of matmul primitive when fused with bias update and sum (f993b25)
- Fixed a bug in matmul primitive when writing to non-contiguous destination buffer (36d25d4)
v2.2.2
This is a patch release containing the following changes to v2.2.1:
- Fixed performance regression in fp32 forward inner product for shapes with number of output channels equal to 1 for processors with Intel AVX-512 support (714b1fd)
- Fixed performance regression in forward convolutions with groups for processors with Intel AVX-512 support(3555d4a)
- Removed
-std=c++11build flag for DPC++ headers (1fcb867) - Fixed buffer access in initializing workspace in RNN implementation on GPU (9b03091)
- Fixed fix a bug in convolution with 1x1 kernel and mixed strides on processors with Intel AVX-512 support (d0b3e3f)
- Used getauxval for Linux to get CPU features on for AArch64 systems (25c4cea)
- Added
-fp-model=precisebuild flag for DPC++ code (3e40e5e) - Fixed out-of-bounds writes in elementwise primitive on Intel Processor Graphics (bcf823c)
v2.2.1
This is a patch release containing the following changes to v2.2:
- Fixed segfault for cases when primitive descriptor or attributed contain
NaN(e6d05ec, dbca1e9, 0326b09, 0326b09) - Fixed engine creation failure for GPU subdevices (4c3a114)
- Fixed long lines clipping in verbose output (70d70a8)
- Fixed segfault in bfloat16 convolution weight gradient implementation on processors with Intel AMX support (a3a73a3)
- Fixed performance regression in binary primitive with
per_ocbroadcast strategy (9ac85d8) - Worked around a bug with Microsoft Visual C++ compiler version detection in CMake 3.19 (2f39155)
- Removed
-std=c++11build flag for DPC++ code to align with SYCL standard (1b026f5)
v2.1.3
This is a patch release containing the following changes to v2.1.2:
v2.2
Performance Optimizations
- Intel Architecture processors
- Improved performance of int8 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
- Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support.
- Improved
dnnl_gemmperformance for cases withn=1on all supported processors.
- Intel Graphics products
- Introduced NHWC format support for activations for int8 primitives.
- AArch64-based processors
- Improved performance of fp32 and int8 convolution, and softmax primitives for processors with SVE 512 support.
- Improved performance of fp32 convolution via Arm Compute Library (ACL).
- Improved performance of convolution with a combination of
sumandrelupost-ops via ACL.
Functionality
- Extended eltwise primitive with support for
mishandhardswishalgorithms. - Extended binary primitive with support for comparison operators.
- Introduced support for post-ops in GPU resampling implementation.
- Introduced asymmetric quantization support for int8 deconvolution.
- Introduced binary post-ops support for matmul primitive.
Usability
- Improved presentation of oneDNN primitives in VTune Amplifier.
- Introduced Linux perf support for AArch64.
- Introduced support for Fujitsu C++ compiler.
- Introduced a build time check for minimal supported ACL version. Currently oneDNN requires ACL 21.02 or later.
- Added support for cuDNN 8.x
Thanks to the contributors
This release contains contributions from the project core team as well as Aleksandr Nikolaev @alenik01, araki.kenichi @qnet-araki, Arthur Mitrano @aaraujom, Dr-Noob @Dr-Noob, Gmc2 @GHGmc2, higuchi.motoko @higuchi-motoko, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, Louie Tsai @louie-tsai, masafumi yamazaki @m-ymzk, Nathan John Sircombe @nSircombe, Takumi-H @Takumi-Honda. We would also like to thank everyone who asked questions and reported issues.