Skip to content

v1.5

Choose a tag to compare

@anita-intel anita-intel released this 17 Jun 18:09
· 13 commits to rls-v1.5 since this release

Performance optimizations

Intel Architecture processors

  • Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations on all supported processors
  • Improved binary primitive performance for the broadcast case
  • Improved performance of eltwise primitive backpropagation and corresponding post-ops
  • Improved performance of pooling, resampling, LRN primitives
  • Improved performance of bfloat16 and fp32 weights gradient convolutions with groups
  • Improved performance of int8 convolutions with 1x1 kernel and spatial strides

Intel Processor Graphics and Xe architecture-based Graphics

  • Introduced initial optimizations for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
  • Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations.

Usability

Validation

  • Introduced validation mode to detect out of bounds access.

Thanks to the contributors

This release contains contributions from the project core team as well as Anuj Mittal @anujm1, Arthur Mitrano @aaraujom, Benjamin Fitch, Ilia Taraban @itaraban, Leona C. @indie, Nathan John Sircombe @nSircombe, Sergey Nesterov @cepera, Tsao Zhong @CaoZhongZ, yuri@FreeBSD @yurivict. We would also like to thank everyone who asked questions and reported issues.