04 Aug 20:43

vpirogov

f5f25f4

v2.3.1

This is a patch release containing the following changes to v2.3:

Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support (f5c071b)
Fixed integer overflow for inner product implementation on CPUs (66971b5)
Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 (4e81df0)
Fixed correctness issue for depthwise convolution post-op with non-default scales on CPUs (783e1d6, 066c832)
Fixed crash for s8 binary primitive on Windows (d9fd397)
Fixed performance regression in fp32 to u8 reorder for Intel AMX specific memory formats (97f40cf, 532648a)
Fixed correctness issue for bfloat16 convolution weight gradient on processors with Intel AMX support (053406d, 6649b75)
Fixed correctness issue for bfloat16 inner product backpropagation on processors with Intel AMX support (a2e6c55)
Fixed correctness issue for bfloat16 convolution with padded memory formats on GEN9 GPUs (c0aea07)
Fixed correctness issue for int8 matmul primitive with zero points on processors with Intel AMX support (55cb716)
Fixed segfault in depthwise convolution post-op on CPUs (ad46635)

Assets 2

30 Jun 20:42

vpirogov

v2.3

593e0de

v2.3

Performance Optimizations

Extended primitive cache to improve primitive descriptor creation performance.
Improved primitive cache performance in multithreaded configurations.
Intel Architecture Processors
- Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
- Improved performance of reduction primitive.
- Improved performance of depthwise convolution primitive with NHWC activations for training cases
Intel Graphics Products
- Improved fp32 and fp16 Winograd convolution performance.
- Introduced support for automatic selection between direct and Winograd convolution algorithms.
- Improved int8 depthwise convolution performance.
- Improved performance of reorder, shuffle, concat, binary, and batch normalization primitives
- Improved layer normalization performance for blocked formats.
AArch64-based Processors
- Improved reorder primitive performance for systems with SVE 128 and SVE 256 support.
- Improved eltwise primitive performance for systems with SVE 512 support.

Functionality

Extended batch normalization and layer normalization primitives API to take separate scale and shift arguments.
Extended resampling primitive with post-ops support and mixed source and destination data types.

Usability

Introduced binary distribution in conda-forge. Supported configurations cover Linux, Windows, and macOS operating systems and Intel64/AMD64, Aarch64, and PPC64 architectures.
Introduced support for GPU-only build. This configuration helps to reduce binary footprint for applications targeting GPU.
Introduced an option to use GNU OpenMP as CPU runtime for DPC++ configuration.
Introduced verbose log converter. This tool processes oneDNN verbose logs and generates test cases for benchdnn.

Breaking Changes

Updated minimal supported CMake version from to 2.8.12 (was 2.8.11).
Updated minimal supported ACL version from 21.05 (was 21.02).

Thanks to the Contributors

This release contains contributions from the project core team as well as Alexandre Truong @aletru01, Arthur Mitrano @aaraujom, fitchbe @fitchbe, Isuru Fernando @isuruf, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, leizheng1 @leizheng1, Nomoto Kazuhiro @NomotoKazuhiro, Peter Caday @petercad, Pablo Romero @pablocum, Takumi-H @Takumi-Honda, Uwe L. Korn @xhochy, Vasily Rubtsov @vasilyru. We would also like to thank everyone who asked questions and reported issues.

Assets 2

17 Jun 14:22

vpirogov

v2.3-rc2

81a1d98

v2.3-rc2 Pre-release

Pre-release

This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.

Assets 2

14 Jun 23:41

vpirogov

v2.2.4

c849328

v2.2.4

This is a patch release containing the following changes to v2.2.3:

Fixed build error with GCC 11 (eda1add)
Fixed an issue with reorder reporting unimplemented when quantizing f32 weights to s8 (4f05b76, 5d3d1e1, cc77eef)
Updated name for GPU gen12 architecture to xe (3d202c2)

Assets 2

08 Jun 22:59

vpirogov

v2.3-rc

900c80c

v2.3-rc Pre-release

Pre-release

This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.

Assets 2

28 May 18:27

vpirogov

v2.2.3

7336ca9

v2.2.3

This is a patch release containing the following changes to v2.2.2:

Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support (8a784c6, f0e4af9)
Fixed correctness issue for PReLU primitive on Intel Processor Graphics (f3c3daf)
Fixed corretness issue in reorder for blocked layouts with zero padding (68f05d0, d51616b, fd2c642)
Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support (23b2ec0, 10f8187, 4c0819c)
Added -fp-model=precise build flag for DPC++ code (3e40e5e)
Fixed potential memory leak in matmul primitive (36dba73)
Fixed performance of matmul primitive when fused with bias update and sum (f993b25)
Fixed a bug in matmul primitive when writing to non-contiguous destination buffer (36d25d4)

Assets 2

28 Apr 20:45

tprimak

v2.2.2

518b5e0

v2.2.2

This is a patch release containing the following changes to v2.2.1:

Fixed performance regression in fp32 forward inner product for shapes with number of output channels equal to 1 for processors with Intel AVX-512 support (714b1fd)
Fixed performance regression in forward convolutions with groups for processors with Intel AVX-512 support(3555d4a)
Removed -std=c++11 build flag for DPC++ headers (1fcb867)
Fixed buffer access in initializing workspace in RNN implementation on GPU (9b03091)
Fixed fix a bug in convolution with 1x1 kernel and mixed strides on processors with Intel AVX-512 support (d0b3e3f)
Used getauxval for Linux to get CPU features on for AArch64 systems (25c4cea)
Added -fp-model=precise build flag for DPC++ code (3e40e5e)
Fixed out-of-bounds writes in elementwise primitive on Intel Processor Graphics (bcf823c)

Assets 2

10 Apr 00:17

vpirogov

v2.2.1

f58682c

v2.2.1

This is a patch release containing the following changes to v2.2:

Fixed segfault for cases when primitive descriptor or attributed contain NaN (e6d05ec, dbca1e9, 0326b09, 0326b09)
Fixed engine creation failure for GPU subdevices (4c3a114)
Fixed long lines clipping in verbose output (70d70a8)
Fixed segfault in bfloat16 convolution weight gradient implementation on processors with Intel AMX support (a3a73a3)
Fixed performance regression in binary primitive with per_oc broadcast strategy (9ac85d8)
Worked around a bug with Microsoft Visual C++ compiler version detection in CMake 3.19 (2f39155)
Removed -std=c++11 build flag for DPC++ code to align with SYCL standard (1b026f5)

Assets 2

01 Apr 04:28

vpirogov

v2.1.3

b289450

v2.1.3

This is a patch release containing the following changes to v2.1.2:

Updated xbyak_aarch64 to support Apple silicon (dd1a02a, 913010b, 2d155dd)
Fixed segfault in fp32 depthwise convolution with padded memory (2d8283f)
Fixed potential issues in BRGEMM-based convolution implementation (b183dff, d2b1653)
Fixed memory leak on NVIDIA GPUs (06803f2)

Assets 2

31 Mar 20:47

vpirogov

v2.2

4a12954

v2.2

Performance Optimizations

Intel Architecture processors
- Improved performance of int8 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
- Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support.
- Improved dnnl_gemm performance for cases with n=1 on all supported processors.
Intel Graphics products
- Introduced NHWC format support for activations for int8 primitives.
AArch64-based processors
- Improved performance of fp32 and int8 convolution, and softmax primitives for processors with SVE 512 support.
- Improved performance of fp32 convolution via Arm Compute Library (ACL).
- Improved performance of convolution with a combination of sum and relu post-ops via ACL.

Functionality

Extended eltwise primitive with support for mish and hardswish algorithms.
Extended binary primitive with support for comparison operators.
Introduced support for post-ops in GPU resampling implementation.
Introduced asymmetric quantization support for int8 deconvolution.
Introduced binary post-ops support for matmul primitive.

Usability

Improved presentation of oneDNN primitives in VTune Amplifier.
Introduced Linux perf support for AArch64.
Introduced support for Fujitsu C++ compiler.
Introduced a build time check for minimal supported ACL version. Currently oneDNN requires ACL 21.02 or later.
Added support for cuDNN 8.x

Thanks to the contributors

This release contains contributions from the project core team as well as Aleksandr Nikolaev @alenik01, araki.kenichi @qnet-araki, Arthur Mitrano @aaraujom, Dr-Noob @Dr-Noob, Gmc2 @GHGmc2, higuchi.motoko @higuchi-motoko, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, Louie Tsai @louie-tsai, masafumi yamazaki @m-ymzk, Nathan John Sircombe @nSircombe, Takumi-H @Takumi-Honda. We would also like to thank everyone who asked questions and reported issues.

Assets 12

Releases: uxlfoundation/oneDNN

v2.3.1

Uh oh!

v2.3

Performance Optimizations

Functionality

Usability

Breaking Changes

Thanks to the Contributors

Uh oh!

v2.3-rc2

Uh oh!

v2.2.4

Uh oh!

v2.3-rc

Uh oh!

v2.2.3

Uh oh!

v2.2.2

Uh oh!

v2.2.1

Uh oh!

v2.1.3

Uh oh!

v2.2

Performance Optimizations

Functionality

Usability

Thanks to the contributors

Uh oh!