v3.9.1
This is a patch release containing the following changes to v3.9:
- Reduced sizes in Graph API SDPA examples (257d689)
- Fixed correctness issue in
bf16depthwise convolution withbf16bias on AArch64 CPUs (218b41d) - Changed Intel GPU data alignment check from error to warning (5c5008a)
- Improved
bf16matmul performance on processors with Intel AMX instruction set support (54b6354, 30c4d8d) - Fixed PowerPC64 build by adding
-mcpu=power10and-mmmaflags (02ca915) - Introduced support for
f16destination inint8matmul andint8inner product on x64 CPUs (a62ed6b, 53c0a66, 0750043, 4f0f068) - Introduced support
per_tensorzero-points inint8matmul on Intel GPUs (db8e8ff, f783164, 4d458df, 80453a0, 7f90d50, a2200e2) - Fixed correctness issue in
int8reorder for cases with compensation on x64 CPUs (771ca54)