10 Jul 22:35

anita-intel

71f7029

v1.6-rc Pre-release

Pre-release

This is a release candidate for oneDNN v1.6. Please provide feedback and report bugs in Github issues.

Assets 2

07 Jul 19:25

vpirogov

v1.5.1

ab54934

v1.5.1

This is a patch release containing following changes to v1.5:

Fixed potential crash related to primtive cache (95eff24, 00205d3)
Fixed correctness issue for Winograd convolution implementation on Intel Xeon Phi processors (f310ded)
Fixed issue with tail processing in channel dimension for depthwise convolution (24eda67)

Assets 2

02 Jul 19:28

anita-intel

v2.0-beta07

3eb3c47

v2.0-beta07 Pre-release

Pre-release

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.5.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Performance optimizations

Intel Architecture processors

Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations on all supported processors
Improved binary primitive performance for the broadcast case
Improved performance of eltwise primitive backpropagation and corresponding post-ops
Improved performance of pooling, resampling, LRN primitives
Improved performance of bfloat16 and fp32 weights gradient convolutions with groups
Improved performance of int8 convolutions with 1x1 kernel and spatial strides

Intel Processor Graphics and Xe architecture-based Graphics

Introduced initial optimizations for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations.

New Functionality

Level Zero (L0) GPU runtime is used by default on Windows* operating system. OpenCL GPU runtime still can be used if SYCL_BE environment variable is set to PI_OPENCL before running a DPC++ program.

Usability

Introduced support for Arm* 64-bit Architecture (AArch64) and other non-x86 processors.
Separated primitive cache state from engine making it persistent.
Introduced API for managing primitive cache state.

Validation

Introduced validation mode to detect out of bounds access.

Known Limitations

RNN functionality is not functional with Level Zero GPU runtime. The workaround is to use OpenCL GPU runtime via setting SYCL_BE=PI_OPENCL before running a DPC++ program.
Optimized primitives can crash or fail for huge spatial sizes on CPU.
f32 convolutions may fail sporadically on Intel® Processor Graphics Gen11 due to a known issue in Intel Graphics Compiler.
Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

Assets 10

17 Jun 18:09

anita-intel

v1.5

e2ac1fa

v1.5

Performance optimizations

Intel Architecture processors

Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations on all supported processors
Improved binary primitive performance for the broadcast case
Improved performance of eltwise primitive backpropagation and corresponding post-ops
Improved performance of pooling, resampling, LRN primitives
Improved performance of bfloat16 and fp32 weights gradient convolutions with groups
Improved performance of int8 convolutions with 1x1 kernel and spatial strides

Intel Processor Graphics and Xe architecture-based Graphics

Introduced initial optimizations for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations.

Usability

Introduced support for Arm* 64-bit Architecture (AArch64) and other non-x86 processors.
Separated primitive cache state from engine making it persistent.
Introduced API for managing primitive cache state.

Validation

Introduced validation mode to detect out of bounds access.

Thanks to the contributors

This release contains contributions from the project core team as well as Anuj Mittal @anujm1, Arthur Mitrano @aaraujom, Benjamin Fitch, Ilia Taraban @itaraban, Leona C. @indie, Nathan John Sircombe @nSircombe, Sergey Nesterov @cepera, Tsao Zhong @CaoZhongZ, yuri@FreeBSD @yurivict. We would also like to thank everyone who asked questions and reported issues.

Assets 10

27 May 18:00

anita-intel

v1.5-rc

3bbcb48

v1.5-rc Pre-release

Pre-release

This is a release candidate for oneDNN v1.5. Please provide feedback and report bugs in Github issues.

Assets 2

23 Apr 19:07

vpirogov

v0.21.5

4175362

v0.21.5

This is a patch release containing following changes to v0.21.4:

Fixed s8 reorders that did not compute compensation correctly (d446661, 7a49772)
Fixed potential buffer overflow in int8 convolution scratchpad (8c5c7cf)
Fixed segfault for s8 reorders on blocked formats (9497acc, 6f1d0c9)
Fixed correctness in fp32 convolution weight gradient with dilation and padding (503bf57, d00afab)
Fixed correctness inssue in 1D bfloat16 dilated convolution (481dd39)

Assets 2

17 Apr 16:50

anita-intel

v1.4

f7c41dc

v1.4

Performance optimizations

Intel Architecture processors:
- Improved performance of int8 GEMM, RNN, inner product, matmul and GEMM-based convolution for systems with Intel SSE4.1 and Intel AVX support.
- Improved performance of eltwise backpropagation on all supported processors.
- Improved performance of bfloat16 inner product for processors with Intel DL Boost support.
Intel Processor Graphics
- Improved performance of the following functionality with NHWC activations:
  - f32 convolution forward propagation
  - f32 and f16 pooling
  - f32 and f16 batch normalization forward propagation.
- Improved performance of f32 and f16 batch normalization forward propagation and binary primitives

New functionality

Introduced support for LSTM cell with projection (LSTMP). The functionality is not implemented for Intel Processor Graphics.
Introduced bfloat16 data type support for Softmax and LogSoftmax primitives.

Usability improvements

Introduced threadpool CPU runtime. New runtime allows to run multi-thread computations with user-provided threadpool implementation, for instance Eigen threadpool.
Extended set of examples to cover all primitives supported by the library. New examples are included into corresponding sections of the Developer Guide.

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Ilya Taraban @itaraban, Nathan Sircombe @nSircombe, and Sergey Nesterov @cepera. We would also like to thank everyone who asked questions and reported issues.

Assets 10

13 May 17:57

anita-intel

v2.0-beta06

436f4df

v2.0-beta06 Pre-release

Pre-release

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.4.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

New Functionality

Level Zero (L0) GPU runtime is used by default on Linux. OpenCL GPU runtime still can be used if SYCL_BE environment variable is set to PI_OPENCL before running a DPC++ program.

Known Limitations

Level Zero GPU runtime is not supported on Windows OS.
RNN functionality is not functional with Level Zero GPU runtime. The workaround is to use OpenCL GPU runtime via setting SYCL_BE=PI_OPENCL before running a DPC++ program.
Zero Level runtime is enabled by default. Please make sure proper installation of zero level driver including level-zero-devel package following installation guide. If users still encounter runtime issue, please apply workaround to set SYCL_BE=PI_OPENCL before running a DPC++ program.
Optimized primitives can crash or fail for huge spatial sizes on CPU.
dnnl_sgemm, dnnl_gemm_u8s8u32, and inner product functionality does not support sizes exceeding 2^32.
f32 convolutions may fail sporadically on Intel® Processor Graphics Gen11 due to a known issue in Intel Graphics Compiler.
Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

Assets 10

06 Apr 22:34

anita-intel

v1.4-rc

1b05a28

v1.4-rc Pre-release

Pre-release

This is a release candidate for DNNL v1.4. Please provide feedback and report bugs in Github issues.

Assets 2

02 Apr 17:39

anita-intel

v1.3

07579e6

v1.3

Performance optimizations

Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.

New functionality

Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
Implemented matmul primitive for Intel Processors Graphics.
Extended binary primitive with min and max algorithms support.
Extended eltwise primitive:
- Introduced erf-based implementation of gelu algorithm
- Introduced pow algorithm
- Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
Extended set of operations for memory descriptors:
*Added support for changing the number of dimensions with existing dnnl::memory::desc::reshape() method
- Introduced dnnl::memory::desc::permute_axes()) method to change logical axes order

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.

Assets 10

Releases: uxlfoundation/oneDNN

v1.6-rc

Uh oh!

v1.5.1

Uh oh!

v2.0-beta07

Performance optimizations

Intel Architecture processors

Intel Processor Graphics and Xe architecture-based Graphics

New Functionality

Usability

Validation

Known Limitations

Uh oh!

v1.5

Performance optimizations

Intel Architecture processors

Intel Processor Graphics and Xe architecture-based Graphics

Usability

Validation

Thanks to the contributors

Uh oh!

v1.5-rc

Uh oh!

v0.21.5

Uh oh!

v1.4

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

Uh oh!

v2.0-beta06

New Functionality

Known Limitations

Uh oh!

v1.4-rc

Uh oh!

v1.3

Performance optimizations

New functionality

Thanks to the contributors

Uh oh!