v3.8.1
This is a patch release containing the following changes to v3.8:
- Fixed correctness issue in reorder primitive with non-trivial strides on Intel CPUs (a762d32)
- Fixed runtime error in convolution weight gradient on Xe2 architecture-based Intel GPUs (a8fac73, c409ef9)
- Fixed performance regression in
bf16convolution on Intel Datacenter GPU Max Series (98170d0, c6bae4a, c5edd53, bb1a591) - Improved performance of
fp16matmul withfp8compressed weights on Intel GPUs (58f3ec1, abff176, ffd7dd3, 3b1e855, 2e140de, 3429f79) - Fixed runtime error in
fp16pooling primitive on Xe2 architecture based Intel GPUs (c0f6b6d) - Improved performance of
fp16matmul withint4weights and32 < m <= 64on Intel GPUs (2fa7072) - Fixed correctness issues in
bf16matmul with 3 or more dimensional tensors on processors with Intel AMX support (dd20965, ea1b4a1) - Fixed performance regression in
fp16orbf16matmul with transposed source and weight tensors on Intel Datacenter GPU Max Series (e45e1aa) - Improved performance of
bf16matmul withint4weights on Intel GPUs (7a15c23) - Fixed runtime error in
fp16SDPA subgraph with head size512on Intel Core Ultra (Series 2) processor integrated GPU (bde6985)