Skip to content

Releases: uxlfoundation/oneDNN

v2.3.1

04 Aug 20:43

Choose a tag to compare

This is a patch release containing the following changes to v2.3:

  • Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support (f5c071b)
  • Fixed integer overflow for inner product implementation on CPUs (66971b5)
  • Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 (4e81df0)
  • Fixed correctness issue for depthwise convolution post-op with non-default scales on CPUs (783e1d6, 066c832)
  • Fixed crash for s8 binary primitive on Windows (d9fd397)
  • Fixed performance regression in fp32 to u8 reorder for Intel AMX specific memory formats (97f40cf, 532648a)
  • Fixed correctness issue for bfloat16 convolution weight gradient on processors with Intel AMX support (053406d, 6649b75)
  • Fixed correctness issue for bfloat16 inner product backpropagation on processors with Intel AMX support (a2e6c55)
  • Fixed correctness issue for bfloat16 convolution with padded memory formats on GEN9 GPUs (c0aea07)
  • Fixed correctness issue for int8 matmul primitive with zero points on processors with Intel AMX support (55cb716)
  • Fixed segfault in depthwise convolution post-op on CPUs (ad46635)

v2.3

30 Jun 20:42

Choose a tag to compare

Performance Optimizations

  • Extended primitive cache to improve primitive descriptor creation performance.
  • Improved primitive cache performance in multithreaded configurations.
  • Intel Architecture Processors
    • Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
    • Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
    • Improved performance of reduction primitive.
    • Improved performance of depthwise convolution primitive with NHWC activations for training cases
  • Intel Graphics Products
    • Improved fp32 and fp16 Winograd convolution performance.
    • Introduced support for automatic selection between direct and Winograd convolution algorithms.
    • Improved int8 depthwise convolution performance.
    • Improved performance of reorder, shuffle, concat, binary, and batch normalization primitives
    • Improved layer normalization performance for blocked formats.
  • AArch64-based Processors
    • Improved reorder primitive performance for systems with SVE 128 and SVE 256 support.
    • Improved eltwise primitive performance for systems with SVE 512 support.

Functionality

Usability

  • Introduced binary distribution in conda-forge. Supported configurations cover Linux, Windows, and macOS operating systems and Intel64/AMD64, Aarch64, and PPC64 architectures.
  • Introduced support for GPU-only build. This configuration helps to reduce binary footprint for applications targeting GPU.
  • Introduced an option to use GNU OpenMP as CPU runtime for DPC++ configuration.
  • Introduced verbose log converter. This tool processes oneDNN verbose logs and generates test cases for benchdnn.

Breaking Changes

  • Updated minimal supported CMake version from to 2.8.12 (was 2.8.11).
  • Updated minimal supported ACL version from 21.05 (was 21.02).

Thanks to the Contributors

This release contains contributions from the project core team as well as Alexandre Truong @aletru01, Arthur Mitrano @aaraujom, fitchbe @fitchbe, Isuru Fernando @isuruf, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, leizheng1 @leizheng1, Nomoto Kazuhiro @NomotoKazuhiro, Peter Caday @petercad, Pablo Romero @pablocum, Takumi-H @Takumi-Honda, Uwe L. Korn @xhochy, Vasily Rubtsov @vasilyru. We would also like to thank everyone who asked questions and reported issues.

v2.3-rc2

17 Jun 14:22

Choose a tag to compare

v2.3-rc2 Pre-release
Pre-release

This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.

v2.2.4

14 Jun 23:41

Choose a tag to compare

This is a patch release containing the following changes to v2.2.3:

  • Fixed build error with GCC 11 (eda1add)
  • Fixed an issue with reorder reporting unimplemented when quantizing f32 weights to s8 (4f05b76, 5d3d1e1, cc77eef)
  • Updated name for GPU gen12 architecture to xe (3d202c2)

v2.3-rc

08 Jun 22:59

Choose a tag to compare

v2.3-rc Pre-release
Pre-release

This is a release candidate for oneDNN v2.3. Please provide feedback and submit defect reports via Github issues.

v2.2.3

28 May 18:27

Choose a tag to compare

This is a patch release containing the following changes to v2.2.2:

  • Fixed a bug in int8 depthwise convolution ptimitive with groups and 1d spatial size for processors with Intel AVX-512 and Intel AVX2 support (8a784c6, f0e4af9)
  • Fixed correctness issue for PReLU primitive on Intel Processor Graphics (f3c3daf)
  • Fixed corretness issue in reorder for blocked layouts with zero padding (68f05d0, d51616b, fd2c642)
  • Improved performance of weights reorders used by BRGEMM-based convolution primitive for processors with Intel AVX-512 support (23b2ec0, 10f8187, 4c0819c)
  • Added -fp-model=precise build flag for DPC++ code (3e40e5e)
  • Fixed potential memory leak in matmul primitive (36dba73)
  • Fixed performance of matmul primitive when fused with bias update and sum (f993b25)
  • Fixed a bug in matmul primitive when writing to non-contiguous destination buffer (36d25d4)

v2.2.2

28 Apr 20:45

Choose a tag to compare

This is a patch release containing the following changes to v2.2.1:

  • Fixed performance regression in fp32 forward inner product for shapes with number of output channels equal to 1 for processors with Intel AVX-512 support (714b1fd)
  • Fixed performance regression in forward convolutions with groups for processors with Intel AVX-512 support(3555d4a)
  • Removed -std=c++11 build flag for DPC++ headers (1fcb867)
  • Fixed buffer access in initializing workspace in RNN implementation on GPU (9b03091)
  • Fixed fix a bug in convolution with 1x1 kernel and mixed strides on processors with Intel AVX-512 support (d0b3e3f)
  • Used getauxval for Linux to get CPU features on for AArch64 systems (25c4cea)
  • Added -fp-model=precise build flag for DPC++ code (3e40e5e)
  • Fixed out-of-bounds writes in elementwise primitive on Intel Processor Graphics (bcf823c)

v2.2.1

10 Apr 00:17

Choose a tag to compare

This is a patch release containing the following changes to v2.2:

  • Fixed segfault for cases when primitive descriptor or attributed contain NaN (e6d05ec, dbca1e9, 0326b09, 0326b09)
  • Fixed engine creation failure for GPU subdevices (4c3a114)
  • Fixed long lines clipping in verbose output (70d70a8)
  • Fixed segfault in bfloat16 convolution weight gradient implementation on processors with Intel AMX support (a3a73a3)
  • Fixed performance regression in binary primitive with per_oc broadcast strategy (9ac85d8)
  • Worked around a bug with Microsoft Visual C++ compiler version detection in CMake 3.19 (2f39155)
  • Removed -std=c++11 build flag for DPC++ code to align with SYCL standard (1b026f5)

v2.1.3

01 Apr 04:28

Choose a tag to compare

This is a patch release containing the following changes to v2.1.2:

  • Updated xbyak_aarch64 to support Apple silicon (dd1a02a, 913010b, 2d155dd)
  • Fixed segfault in fp32 depthwise convolution with padded memory (2d8283f)
  • Fixed potential issues in BRGEMM-based convolution implementation (b183dff, d2b1653)
  • Fixed memory leak on NVIDIA GPUs (06803f2)

v2.2

31 Mar 20:47

Choose a tag to compare

Performance Optimizations

  • Intel Architecture processors
    • Improved performance of int8 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
    • Improved performance of compute functionality for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
    • Improved fp32 inner product forward propagation performance for processors with Intel AVX-512 support.
    • Improved dnnl_gemm performance for cases with n=1 on all supported processors.
  • Intel Graphics products
    • Introduced NHWC format support for activations for int8 primitives.
  • AArch64-based processors
    • Improved performance of fp32 and int8 convolution, and softmax primitives for processors with SVE 512 support.
    • Improved performance of fp32 convolution via Arm Compute Library (ACL).
    • Improved performance of convolution with a combination of sum and relu post-ops via ACL.

Functionality

  • Extended eltwise primitive with support for mish and hardswish algorithms.
  • Extended binary primitive with support for comparison operators.
  • Introduced support for post-ops in GPU resampling implementation.
  • Introduced asymmetric quantization support for int8 deconvolution.
  • Introduced binary post-ops support for matmul primitive.

Usability

  • Improved presentation of oneDNN primitives in VTune Amplifier.
  • Introduced Linux perf support for AArch64.
  • Introduced support for Fujitsu C++ compiler.
  • Introduced a build time check for minimal supported ACL version. Currently oneDNN requires ACL 21.02 or later.
  • Added support for cuDNN 8.x

Thanks to the contributors

This release contains contributions from the project core team as well as Aleksandr Nikolaev @alenik01, araki.kenichi @qnet-araki, Arthur Mitrano @aaraujom, Dr-Noob @Dr-Noob, Gmc2 @GHGmc2, higuchi.motoko @higuchi-motoko, Joe Ramsay @joeramsay, Kentaro Kawakami @kawakami-k, Louie Tsai @louie-tsai, masafumi yamazaki @m-ymzk, Nathan John Sircombe @nSircombe, Takumi-H @Takumi-Honda. We would also like to thank everyone who asked questions and reported issues.