v3.8
Performance Optimizations
Intel Architecture Processors
- Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.
- Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.
- Improved performance of
int8convolution support with zero points. - Improved
fp32convolution performance withfp16andbf16compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support. - Improved
fp16/bf16depthwise convolution performance withfp32bias orsumpost-ops or dilation. - Improved
bf16pooling backpropagation performance. - Improved binary post-ops performance with
per_wbroadcast.
Intel Graphics Products
- Improved performance on Intel Arc graphics for future Intel Core Ultra processors (code name Panther Lake).
- Improved convolution performance on:
- Intel Arc Graphics for Intel Core Ultra processor series 2 (formerly Lunar Lake).
- Intel Arc B-series discrete graphics (formerly Battlemage).
- Improved
int8matmul performance with zero-points support for source and weight tensors. - Improved
f4_e2m1andf4_e3m0matmul and reorder performance. - Improved performance of the following subgraphs with Graph API:
- Scaled Dot Product Attention (SDPA) with
int4andint8compressed key and value. fp16/bf16SDPA withfp32intermediate data types. Usingfp32intermediate data types is recommended.- SDPA with head size 512 and 576.
- Grouped Query Attention (GQA) with 5D input tensors.
- Scaled Dot Product Attention (SDPA) with
AArch64-based Processors
- Improved
fp16reorder performance. - Improved
int8matmul performance. - Improved
bf16inner product forward propagation performance with Arm Compute Library (ACL). - Improved
bf16eltwise performance. - Improved convolution performance on processors with SVE support with ACL.
Functionality
Common
- Extended Graph API
Softmaxoperation to supportinf_as_zeromode. This functionality enables SDPA subgraph compliant with Pytorch Safe Softmax semantics.
Intel Architecture Processors
- Introduced support for
f32convolution withfp16compressed weights. - Enabled
int8/int4compressed weights support in matmul primitive.
Intel Graphics Products
- Introduced select algorithm support in binary primitive.
- Introduced support for
f4_e2m1andf4_e3m0data types in convolution primitive. - Introduced support for the GenIndex operation in Graph API.
Generic GPU Vendor
- Introduced support for:
- Vanilla RNN forward propagation.
- Inner product backpropagation.
- Group normalization.
- Improved accuracy of inner product primitive with sum post-ops for large shapes.
NVIDIA GPUs
- Introduced Graph API support.
Usability
- Added support for group normalization primitive with
ONEDNN_ENABLE_PRIMITIVEbuild option. - Enabled support for ROCm 6 on AMD GPUs.
- Improved CMake integration for oneDNN installation with Nvidia backend enabled.
- Reduced memory footprint for matmul primitive when using ACL.
Validation
- Added benchdnn option
--execution-modeto test oneDNN functionality with SYCL Graph record/execute mode. - Extended benchdnn option
--cold-cachewith support for cold TLB mode. - Added benchdnn option
--bia-dtto control bias data type for matmul, inner product, convolution, and deconvolution primitives. - Extended syntax of benchdnn
--dtoption in Graph API driver to manage data types of individual tensors in a pattern.
Deprecated Functionality
- BLAS-like API including
dnnl::sgemm,dnnl::gemm_u8s8s32, anddnnl::gemm_s8s8s32functions is deprecated and will be removed in future releases. If you are using this API consider switching to matmul primitive.
Breaking Changes
- Removed the experimental Graph Compiler backend for Graph API.
Thanks to our Contributors
This release contains contributions from the project core team as well as Aditya Tewari @aditew01, Alexander Simonov @asimonov1, Denis @redradist, Dmitriy Ovchinnikov @inteldimitrius, Eliezer Weissmann @eliezerweissmann, Hubert Maciak @hmaciak, Ilya Lavrenov @ilya-lavrenov, James McGregor @Jmc18134, @jstachowintel, Marek Michalowski @michalowski-arm, Maria Zhukova @mzhukova, Orel Yehuda @yehudaorel, Ravi Pushkar @rpushkarr, Renato Barros Arantes @renato-arantes, @Shreyas-fuj, Shu Chen @shu1chen, Viktoriia Gvozdeva @vgvozdeva, Yair Obodovsky @yair-obodovsky, and @zhangfeiv0.