v0.16
Performance optimizations
- Improved performance of int8 convolutions with number of input and output channels not divisible by SIMD width on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
- Winograd convolutions optimized for fp32 real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
- Optimized weights update of dilated convolutions for fp32 data type on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
- Improved performance of reorder primitive for int8 data type.
New functionality
- Added dilation support for deconvolution (transposed convolution) primitive.
- Introduced deconvolution (transposed convolution) primitive for int8 data type.
API deprecations and breaking changes
- The default behavior of gemm-based convolutions was changed. Now they use internally allocated thread-local scratchpad memory for im2col and col2im operations, weights reduction, and accumulation. This may cause correctness issues when multiple gemm-based convolutions are created in one thread and executed concurrently in different threads. To support concurrent execution, MKL-DNN library must be configured with
-DMKLDNN_ENABLE_CONCURRENT_EXEC=TRUECMake flag.
Usability improvements
- Extended documentation with details on MKL-DNN memory formats.
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Yasser Zamani @yasserzamani and Loo Rong Jie @rongjiecomputer. We would also like to thank everyone who asked questions and reported issues.
*Other names and brands may be claimed as the property of others.