v1.3
Performance optimizations
- Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
- Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
- Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
- Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.
New functionality
- Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
- Introduced peephole support for LSTM cell on all supported processors. The functionality is not implemented for Intel Processor Graphics.
- Implemented matmul primitive for Intel Processors Graphics.
- Extended binary primitive with min and max algorithms support.
- Extended eltwise primitive:
- Introduced erf-based implementation of gelu algorithm
- Introduced pow algorithm
- Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
- Extended set of operations for memory descriptors:
*Added support for changing the number of dimensions with existingdnnl::memory::desc::reshape()method- Introduced
dnnl::memory::desc::permute_axes()) method to change logical axes order
- Introduced
Thanks to the contributors
This release contains contributions from the project core team as well as Araujo Mitrano, Arthur @aaraujom, Aaron Mark Johnson @aaronjohnson, Benjamin Hipple @bhipple, Sergey Nesterov @cepera, @gaurav1086, Ilya Taraban @itaraban, Mesut Meterelliyoz @mmeterel, @nSircombe, Peter Caday @petercad, and Rafik Saliev @rsaliev. We would also like to thank everyone who asked questions and reported issues.