Skip to content

v1.3

Choose a tag to compare

@anita-intel anita-intel released this 02 Apr 17:39
· 7 commits to rls-v1.3 since this release

Performance optimizations

  • Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
  • Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
  • Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
  • Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.

New functionality

  • Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
  • Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
  • Implemented matmul primitive for Intel Processors Graphics.
  • Extended binary primitive with min and max algorithms support.
  • Extended eltwise primitive:
    • Introduced erf-based implementation of gelu algorithm
    • Introduced pow algorithm
    • Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
  • Extended set of operations for memory descriptors:
    *Added support for changing the number of dimensions with existing dnnl::memory::desc::reshape() method

Thanks to the contributors

This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.