Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v6.1.144
Algorithms
New features
- SSE4.1, AVX2 optimizations of function Yuv444pToRgbaV2.
- SSE4.1 optimizations of class ImageJpegLoader.
- isRgb parameter of function Simd::SynetSetInput.
Bug fixing
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution16bNhwcGemm.
Python wrapper
New features
- isRgb parameter of function Simd.SynetSetInput.
Simd v6.1.143
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution16bNhwcDepthwise.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for class SynetConvolution32fNhwcDepthwise.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w4 for class SynetMergedConvolution16b.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for class SynetConvolution32fNhwcDepthwise.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for class SynetConvolution32fNhwcDepthwise.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w6 for class SynetMergedConvolution16b.
- AMX-BF16 kernel DepthwiseConvolution_k7p3d1s1w8 for class SynetMergedConvolution16b.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w4 for framework SynetMergedConvolution32f.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w6 for framework SynetMergedConvolution32f.
- AVX-512BW kernel Convolution32fNhwcDepthwise_k7p3d1s1w8 for framework SynetMergedConvolution32f.
- AMX-BF16 kernel DepthwiseConvolution_k5p2d1s1w8 for class SynetMergedConvolution16b.
- Base implementation of function SimdYuv444pToRgbaV2.
Improving
- AVX-512BW optimizations of function Convolution32fNhwcDepthwiseDefault.
- AMX-BF16 optimizations of function DepthwiseConvolutionLargePad.
Bug fixing
- Error in Base implementation of class SynetDeconvolution16bNhwcGemm.
Test framework
New features
- Tests for verifying functionality of function SimdYuv444pToRgbaV2.
Simd v6.1.142
Algorithms
New features
- Base implementation of class SynetDeconvolution16bGemm.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetDeconvolution16bNhwcGemm.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveUv.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgr.
- AMX-BF16 (AVX-512VBMI) optimizations of function DeinterleaveBgra.
Improving
- AVX-512BW optimizations of function ConvolutionDirectNhwcConvolutionBiasActivationDepthwise.
Removing
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
- Base implementation of class SynetConvolution32fBf16Gemm.
- Parameter 'compatibility' from function SynetConvolution32fInit.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
- Base implementation of class SynetMergedConvolution32fBf16.
- Parameter 'compatibility' from function SynetMergedConvolution32fInit.
Test framework
New features
- Tests for verifying functionality of SynetDeconvolution16b framework.
Simd v6.1.141
Algorithms
New features
- Support of BFloat16 in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class ResizerNearest.
Bug fixing
- Compiler warning in function Simd::LitterCpuCache.
- Error in AVX-512BW optimizations of class SynetInnerProduct16bGemmNN.
Simd v6.1.140
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetRelu16b.
- API of SynetAdd16b framework.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetAdd16bUniform.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations, AMX-BF16 of class SynetConvolution16bNchwGemm.
Improving
- AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
- Error in Base implementation of class SynetMergedConvolution16bCdc.
- Error in Base implementation of class SynetMergedConvolution16bDc.
- Error in Base implementation of class SynetInnerProduct16bGemmNN.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
Test framework
New features
- Tests for verifying functionality of function SynetRelu16b.
- Tests for verifying functionality of SynetAdd16b framework.
Simd v6.1.139
Algorithms
New features
- API of SynetInnerProduct16b framework.
- Base implementation of class SynetInnerProduct16bRef.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetInnerProduct16bGemmNN.
Bug fixing
- Error in AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
- Error in Base implementation of class SynetConvolution16bNhwcGemm.
- Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Convert16bNhwcDirect.
- Error in SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of function Reorder16bNhwcDirect.
- Error in Base implementation of class SynetMergedConvolution16bCdc.
- Error in Base implementation of class SynetMergedConvolution16bDc.
- Error in Base implementation of class SynetMergedConvolution16bCd.
- Error in AMX-BF16 optimizations of class SynetMergedConvolution16bDc.
Test framework
New features
- Tests for verifying functionality of SynetInnerProduct16b framework.
Simd v6.1.138
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcDirect.
- SimdCpuInfoCurrentFrequency in SimdCpuInfoType enumeration.
- API of SynetMergedConvolution16b framework.
- Base implementation of class SynetMergedConvolution16b.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bDc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetMergedConvolution16bCdc.
- Support of YUV420P format to Simd::Frame.
Improving
- AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Bug fixing
- Errors in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
- Error in Base implementation of class SynetMergedConvolution8i.
Test framework
New features
- -wu command line option to set CPU warm up time in milliseconds.
- Tests for verifying functionality of SynetMergedConvolution16b framework.
Infrastructure
Bug fixing
- Errors in build_and_test_gcc section in Github actions script for CMake.
Simd v6.1.137
Algorithms
New features
- AMX-BF16 (AVX-512VBMI) optimizations of function DescrIntCosineDistance.
- AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNa.
- AMX-BF16 (AVX-512VBMI, AMX-INT8) optimizations of function DescrIntCosineDistancesMxNp.
- API of SynetConvolution16b framework.
- Base implementation of class SynetConvolution16bGemm.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16 optimizations of class SynetConvolution16bNhwcGemm.
Improving
- AVX-512VNNI optimizations of function DescrIntCosineDistance.
- AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNa.
- AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNp.
Test framework
New features
- Tests for verifying functionality of SynetConvolution16b framework.
Simd v6.1.136
Algorithms
New features
- AMX-BF16 (AVX-512VBMI) optimizations of function ChangeColors.
- AMX-BF16 (AVX-512VBMI) optimizations of function NormalizeHistogram.
Improving
- AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
Bug fixing
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
Test framework
New features
- Command line parameter to disable testing of some SIMD extensions.
Bug fixing
- Error in test of function Nv12SaveAsJpegToMemory.
Simd v6.1.135
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetConvolution32fBf16NhwcGemm.
- AMX-BF16 optimizations of function Float32ToBFloat16.
- Support of SimdSynetUnaryOperation32fCos in function SynetUnaryOperation32f.
- Support of SimdSynetUnaryOperation32fSin in function SynetUnaryOperation32f.
Bug fixing
- Error in function SimdCpuInfo (wrong AMX-BF16 detection).
- Error in AVX-512BF16 optimization of function Float32ToBFloat16.
- Error in AMX initialization in function AmxBf16::SupportedByOS.
- Crash in function AmxBf16::ConvolutionBf16NhwcConv_2.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
Removing
- AVX-512BF16 optimizations of function Float32ToBFloat16.
- AVX-512BF16 optimizations of SynetConvolution32fBf16Nhwc.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- AVX-512BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
- Stopping of separate support of AVX-512BF16 extension (only together with AMX-BF16).
Test framework
Bug fixing
- Error in test of SynetMergedConvolution32f framework.
Infrastructure
Removing
- Avx512Bf16 project for MSVS-2022.
- Avx512Bf16 project for MSVS-2019.
- Avx512Bf16 project for MSVS-2015.
- Avx512Bf16 project for MSVS-2017.
- Avx512Bf16 project for CMake.