Skip to content

v3.5-rc

Pre-release
Pre-release
Compare
Choose a tag to compare
@vpirogov vpirogov released this 28 May 19:36
· 83 commits to rls-v3.5 since this release

This is a release candidate for oneDNN v3.5. Please provide feedback and submit defect reports via Github issues.

Performance Optimizations

  • Intel Architecture Processors:

    • Improved performance for 4th generation Intel Xeon Scalable processors (formerly Sapphire Rapids).
    • Improved performance for the future Intel Xeon Scalable processors (code-named Sierra Forest and Granite Rapids).
    • Improved performance of group normalization primitive.
    • Improved performance of matmul primitive with sum post-op for batched cases on processors with Intel AMX instruction set support.
    • Improved performance of the following subgraphs with Graph API:
      • Multi-Query Attention (MQA).
      • Scaled Dot Product Attention (SDPA), including the variant with select operation.
      • LayerNorm + Multiply + Quantize produced by SmoothQuant algorithm.
      • Convolution + Sigmoid + Multiply with mixed precisions.
  • Intel Graphics Products:

    • Improved performance for Processor Graphics based on Xe2 architecture.
    • Improved performance for the Intel Data Center GPU Max Series (formerly Ponte Vecchio).
    • Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and the Intel Data Center GPU Flex Series (formerly Arctic Sound).
    • Improved RNN primitive performance for LSTM cell case.
    • Improved performance of f8_e4m3 data type emulation on Intel Data Center GPU Max Series (formerly Ponte Vecchio).
  • AArch64-based Processors:

    • Improved convolution forward propagation, matmul, and softmax performance for processors with SVE support.
    • Improved bf16 matmul performance with Arm Compute Library (ACL).
    • Improved eltwise primitive performance with gelu_erf algorithm with ACL.

Functionality

  • Introduced sum and binary post-ops support for layer normalization primitive. This functionality is currently implemented on CPUs only.
  • Introduced support for int4 data type and extended quantization model with support for grouped scales and zero points.
  • Introduced fp64 matmul support. This functionality is currently implemented on Intel GPUs only.
  • Extended floating point math mode API to support weight decompression scenarios. See matmul weights decompression example to get started. New floating mode is supported in the following configurations:
    • bfloat16 matmul with int8 weights on Intel CPUs.
    • float16 and bfloat16 matmul with int8 or int4 weights on Intel GPUs.
  • [experimental] Introduced microkernel API for Intel Architecture Processors. This API exposes internal mechanisms used in matmul and convolution implementation to expert users.

Usability

  • Extended error messages for engine and memory objects creation errors.
  • Extended verbose mode diagnostics with information on dispatching decisions for all primitives.
  • Introduced support for clang++ host compiler in SYCL builds.
  • Introduced API for tensor serialization and deserialization.
  • Extended verbose mode diagnostics for Graph API with information on pattern matcher decisions.
  • Introduced OpenCL runtime support for Graph API.
  • Added support for building oneDNN with installed Arm Compute Library (ACL).

Validation

  • Extended benchdnn with support for tensor tags in RNN primitive validation.

Thanks to these Contributors

This release contains contributions from the project core team as well as @AngryLoki, Crefeda Rodrigues @cfRod, Daniel Richard G. @iskunk, @deepeshfujitsu, Dylan Angus @dylan-angus-codeplay, Emanuele Rocca @ema, Hernan Martinez @hmartinez82, John Osorio @kala855, Jonathan Deakin @jondea, @kasturedeeksha, Kentaro Kawakami @kawakami-k, Nikita Shulga @malfet, Radu Salavat @Radu2k, Renato Barros Arantes @renato-arantes, Roman Zhukov @rozhukov, Shreyas-fuj @Shreyas-fuj, Sunita Nadampalli @snadampal, Tadej Ciglarič @t4c1, Vineel Abhinav @vineelabhinav, @vishwascm. We would also like to thank everyone who asked questions and reported issues.