This is a patch release containing the following changes to v1.6.4:
- Fixed issue with memory descriptor size computations (fc836a3)
- Reduced required scratchpad size for RNNs (c7e165a)
- Improved performance of fp16 convolution with bias on GPUs (943760e)
- Fixed segmentation fault for convolution weight gradient on systems with Intel AVX512 support (85e92b3)