v2.7.2
This is a patch release containing the following changes to v2.7.1:
- Fixed segfaults in deconvolution backpropagation with ACL on AArch64-based processors (f02e6f3)
- Fixed code generation issues in Intel AVX2 convolution implementation (2ba2523, b60633f, 844326b, 2009164)
- Fixed correcteness issues and runtime errors in deconvolution with binary post-ops on Intel GPUs (dd54d39)
- Improved performance of convolutions with small number of channels and large spatial sizes on systems with Intel AMX (26f97dc, 4cb648d)
- Fixed runtime error in int8 convolutions with groups on Xe architecture based GPUs (e5a70f4)
- Improved inner product weight gradient performance on Xe architecture based GPUs (9e9b859, 12ec4e3)
- Improved batch normalization performance with threadpool threading (4fd5ab2)
- Improved inner product performance with binary post-ops in broadcast mode on Intel CPUs (d43c70d, 49ca4e1)
- Fixed segfaults and correctness issues in sum primitive with threadpool threading (ee7a321)
- Extended persistent cache API to cover engine objects (58481d6, 5f69dad, 16c0a95, 068071b)
- Added support for newer versions of Intel GPU drivers (7144393)
- Updated ITT API version to 3.23.0 (d23cc95)
- Fixed convolution correctness issue on Intel Data Center GPU Flex Series (365ac20)
- Fixed fp64 convolution correctness issue on Intel Data Center GPU MAX Series (9d4bf94, 6705403)
- Fixed correctness issues in reduction primitive with binary post-op on Intel GPUs (ae9d075, e3b80c5)
- Improved convolution performance on on Intel Data Center GPU MAX Series (90be8d5, caf4863)
- Fixed build errors with ONEDNN_ENABLE_PRIMITIVE_GPU_ISA build option (de2db04)
- Fixed correctness issues in convolution with per-tensor binary post-ops on Intel CPUs (9cf9c18)
- Improved convolution performance on Intel Data Center GPU Flex Series (8b08a07)