Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v6.0.134
Algorithms
New features
- SSE4.1 optimizations of ResizerFloatBilinear class.
Improving
- Improve AVX2 optimizations of ResizerFloatBilinear class (AMD CPU).
- Improve AVX2 optimizations of ResizerShortBilinear class (AMD CPU).
Bug fixing
- MSVS compiler bug in file SimdAvx512bwYuvToBgraV2.
- Linux, GCC-13 - crash in function SimdSynetInnerProduct32fForward.
- MSVS compiler bug (Cmake, Windows for ARM64) with functions Extract4Sums.
- GCC-9/10 - compiler error in AVX-512BW optimization of function YToGray.
- GCC-9/10 - compiler error in AVX-512BW optimization of function GrayToY.
Replacing
- Replace AVX optimizations to AVX2 for function CosineDistance32f.
- Replace AVX optimizations to AVX2 for function Fill32f.
- Replace AVX optimizations to AVX2 for ResizerFloatBilinear class.
- Replace AVX optimizations to AVX2 for function SquaredDifferenceSum32f.
- Replace AVX optimizations to AVX2 for function SquaredDifferenceKahanSum32f.
- Replace AVX optimizations to AVX2 for function HogLiteFilterFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteResizeFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteCompressFeatures.
- Replace AVX optimizations to AVX2 for function HogLiteFilterSeparable.
- Replace AVX optimizations to AVX2 for function NeuralPooling2x2Max2x2.
- Replace AVX optimizations to AVX2 for function NeuralProductSum.
- Replace AVX optimizations to AVX2 for function NeuralAddVectorMultipliedByValue.
- Replace AVX optimizations to AVX2 for function NeuralAddVector.
- Replace AVX optimizations to AVX2 for function NeuralAddValue.
- Replace AVX optimizations to AVX2 for function NeuralRoughSigmoid.
- Replace AVX optimizations to AVX2 for function NeuralRoughSigmoid2.
- Replace AVX optimizations to AVX2 for function NeuralRoughTanh.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeRelu.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeTanh.
- Replace AVX optimizations to AVX2 for function NeuralDerivativeSigmoid.
- Replace AVX optimizations to AVX2 for function NeuralUpdateWeights.
- Replace AVX optimizations to AVX2 for function NeuralAdaptiveGradientUpdate.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Forward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Backward.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution2x2Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution3x3Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution4x4Sum.
- Replace AVX optimizations to AVX2 for function NeuralAddConvolution5x5Sum.
- Replace AVX optimizations to AVX2 for function NeuralConvolutionForward.
- Replace AVX optimizations to AVX2 for function SynetAddBias.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward0.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward1.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward2.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward3.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward4.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward8.
- Replace AVX optimizations to AVX2 for function SynetFusedLayerForward9.
- Replace AVX optimizations to AVX2 for function SynetPoolingAverage.
- Replace AVX optimizations to AVX2 for function SynetShuffleLayerForward.
- Replace AVX optimizations to AVX2 for function SynetHardSigmoid32f.
- Replace AVX optimizations to AVX2 for function SynetHswish32f.
- Replace AVX optimizations to AVX2 for function SynetPreluLayerForward.
- Replace AVX optimizations to AVX2 for function SynetRelu32f.
- Replace AVX optimizations to AVX2 for function SynetRestrictRange32f.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x3Block1x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel1x5Block1x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block2x2SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel2x2Block4x4SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block2x2SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block3x3SetOutput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetFilter.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetInput.
- Replace AVX optimizations to AVX2 for function WinogradKernel3x3Block4x4SetOutput.
- Replace AVX optimizations to AVX2 for function GemmPackA.
- Replace AVX optimizations to AVX2 for function GemmPackB.
- Replace AVX optimizations to AVX2 for function GemmScaleC.
- Replace AVX optimizations to AVX2 for function SynetScaleLayerForward.
- Replace AVX optimizations to AVX2 for function SynetInnerProductLayerForward.
- Replace AVX optimizations to AVX2 for function SynetInnerProduct32fInit.
- Replace AVX optimizations to AVX2 for function SynetEltwiseLayerForward.
- Replace AVX optimizations to AVX2 for function SynetDeconvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
- Replace AVX optimizations to AVX2 for SynetInnerProduct32fGemm class.
- Replace AVX optimizations to AVX2 for SynetInnerProduct32fProd class.
- Replace AVX optimizations to AVX2 for SynetDeconvolution32fGemmNN class.
- Replace AVX optimizations to AVX2 for SynetDeconvolution32fNhwcDirect2x2 class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fCdc class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fCd class.
- Replace AVX optimizations to AVX2 for SynetMergedConvolution32fDc class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDepthwiseDotProduct class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDirectNchw class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fDirectNhwc class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fNhwcDirect class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fGemmNT class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fGemmNТ class.
- Replace AVX optimizations to AVX2 for SynetConvolution32fWinograd class.
- Replace AVX optimizations to AVX2 for function SynetConvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
- Replace AVX optimizations to AVX2 for function SynetMergedConvolution32fInit.
Removing
- Base implementation, SSE4.1, AVX, AVX-512BW, NEON, VSX optimizations of function SvmSumLinear.
- Stopping of separate support of AVX extension (only together with AVX2).
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundGrowRangeSlow.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundGrowRangeFast.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundIncrementCount.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundAdjustRange.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundAdjustRangeMasked.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON, VMX optimizations of function EdgeBackgroundShiftRange.
- Base implementatio...
Simd v5.4.133
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForwardV4.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution32fNhwcGroupedBlock1x2.
- Function ImageSaveToFile can choose output file format (if it is undefined) by file extension.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function GrayToY.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function YToGray.
- Add yuvType parameter in Frame structure.
- Support of SimdSynetUnaryOperation32fCeil in function SynetUnaryOperation32f.
- Support of SimdSynetUnaryOperation32fFloor in function SynetUnaryOperation32f.
- Function of Simd::Yuva444pToBgra.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuva422pToBgraV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuva420pToBgraV2.
- The mark of function SimdYuva420pToBgra as deprecated.
Python wrapper
New features
- Wrapper for enumeration Simd.WarpAffineFlags.
- Wrapper for function SimdWarpAffineInit.
- Wrapper for function SimdWarpAffineRun.
- Function Simd.WarpAffine.
- Wrapper for function SimdAbsGradientSaturatedSum.
- Function Simd.AbsGradientSaturatedSum.
- Wrapper for function SimdBgraToBgr.
- Method Simd.Image.Copy.
- Method Simd.Image.Convert.
- Method Simd.Image.Converted.
- Wrapper for function SimdBgraToGray.
- Wrapper for function SimdBgraToRgb.
- Wrapper for function SimdBgraToRgba.
- Wrapper for function SimdCopy.
- Wrapper for function SimdBgrToBgra.
- Wrapper for function SimdBgrToGray.
- Wrapper for function SimdBgrToRgb.
- Wrapper for function SimdRgbToBgra.
- Wrapper for function SimdRgbToGray.
- Wrapper for function SimdRgbaToGray.
- Wrapper for function SimdBgraToYuv420pV2.
- Wrapper for enumeration Simd.FrameFormat.
- Class Simd.ImageFrame.
- Method Simd.ImageFrame.Copy.
- Method Simd.ImageFrame.Convert.
- Method Simd.ImageFrame.Converted.
- Wrapper for function SimdDeinterleaveUv.
- Wrapper for function SimdInterleaveUv.
- Wrapper for function SimdBgrToYuv420pV2.
- Wrapper for function SimdGrayToBgra.
- Wrapper for function SimdGrayToBgr.
- Method Simd.Image.Fill.
- Wrapper for function SimdYuv420pToBgraV2.
- Wrapper for function SimdYuv420pToBgrV2.
- Wrapper for function SimdYuv420pToRgbV2.
- Wrapper for function SimdYToGray.
- Wrapper for function SimdGrayToY.
Test framework
New features
- Tests for verifying functionality of function SynetNormalizeLayerForwardV4.
- Special test for verifying functionality of function Yuv420pToRgbV2.
- Tests for verifying functionality of function GrayToY.
- Tests for verifying functionality of function YToGray.
- Tests for verifying functionality of function Yuva422pToBgraV2.
- Tests for verifying functionality of function Yuva420pToBgraV2.
Infrastructure
Bug fixing
- Error in CMake for ARM platform.
Simd v5.4.132
Algorithms
New features
- Support of RGBA-32 input image format in base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function SynetSetInput.
Bug fixing
- Wrong order of SIMD_DEPRECATED macro.
- Error in AVX-512BW optimizations of function SynetSoftmaxLayerForward.
Python wrapper
New features
- Wrapper for function SimdVersion.
- Wrapper for function SimdRelease.
- Wrapper for function SimdCpuDesc.
- Wrapper for function SimdCpuInfo.
- Wrapper for enumeration SimdCpuDescType.
- Wrapper for enumeration SimdCpuInfoType.
- Wrapper for function SimdPerformanceStatistic.
- Wrapper for function SimdAllocate.
- Wrapper for function SimdFree.
- Wrapper for function SimdAlign.
- Wrapper for function SimdAlignment.
- Wrapper for function SimdGetThreadNumber.
- Wrapper for function SimdSetThreadNumber.
- Wrapper for function SimdEmpty.
- Wrapper for function SimdGetFastMode.
- Wrapper for function SimdSetFastMode.
- Wrapper for enumeration SimdPixelFormatType.
- Class Simd.Image.
- Wrapper for function SimdCrc32.
- Wrapper for function SimdCrc32c.
- Wrapper for enumeration SimdImageFileType.
- Wrapper for function SimdImageSaveToFile.
- Wrapper for function SimdImageLoadFromFile.
- Wrapper for enumeration Simd::View::Position.
- Method Simd.Image.Region.
- Method Simd.Image.RegionAt.
- Wrapper for enumeration Simd.ResizeMethod.
- Wrapper for enumeration Simd.ResizeChannel.
- Wrapper for function SimdResizerInit.
- Wrapper for function SimdResizerRun
- Function Simd.Resize.
- Function Simd.Resized.
- Wrapper for function SimdSynetSetInput.
- Function Simd.SynetSetInput.
- Wrapper for enumeration Simd.TensorFormat.
- Wrapper for enumeration Simd.TensorData.
- Wrapper for enumeration Simd.YuvType.
- Wrapper for function SimdFillPixel.
- Function Simd.FillPixel.
Infrastructure
New features
- SimdPy MSVS project.
Documentation
New features
- Doxygen generation of documentation for Python wrapper.
Simd v5.3.131
Algorithms
New features
- NEON optimizations of function DescrIntCosineDistance.
- NEON optimizations of function DescrIntCosineDistancesMxNa.
- NEON optimizations of function DescrIntCosineDistancesMxNp.
Improving
- NEON optimizations of function DescrIntDecode32f.
- NEON optimizations of function CorrelationSum.
- Base implementation and SSE4.1 optimizations of ImageJpegLoader class.
Bug fixing
- Error in defenition of SIMD_CPP_2011_ENABLE macro for Visual Studio.
- Error in defenition of SIMD_CPP_2014_ENABLE macro for Visual Studio.
- Error in defenition of SIMD_CPP_2017_ENABLE macro for Visual Studio.
- Compiler warning in method Detection::InitLevels.
- Compiler warning in method Detection::FillLevels.
Infrastructure
New features
- Cmake SIMD_UNINSTALL option.
- Cmake SIMD_INSTALL option.
The use examples
New features
- An example how to use Simd::ImageMatcher to find image duplicates.
Simd v5.3.130
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgrToYuv420pV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgrToYuv422pV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function BgrToYuv444pV2.
- Parameter yuvType in function Simd::BgrToYuv420p.
- Parameter yuvType in function Simd::BgrToYuv422p.
- Parameter yuvType in function Simd::BgrToYuv444p.
- The mark of function SimdBgrToYuv420p as deprecated.
- The mark of function SimdBgrToYuv422p as deprecated.
- The mark of function SimdBgrToYuv444p as deprecated.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToRgbV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv422pToRgbV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv444pToRgbV2.
- Parameter yuvType in function Simd::Yuv420pToRgb.
- Parameter yuvType in function Simd::Yuv422pToRgb.
- Parameter yuvType in function Simd::Yuv444pToRgb
- The mark of function SimdYuv420pToRgb as deprecated.
- The mark of function SimdYuv422pToRgb as deprecated.
- The mark of function SimdYuv444pToRgb as deprecated.
- NEON optimizations of function AlphaBlendingBgraToYuv420p.
- NEON optimizations of function DescrIntEncode32f.
- NEON optimizations of function DescrIntEncode16f.
- NEON optimizations of function DescrIntDecode32f.
- NEON optimizations of function DescrIntDecode16f.
Bug fixing
- Error in AVX-512BW optimizations of function SynetSoftmaxLayerForward.
- Error in AVX2 optimizations of class ResizerByteArea2x2 (internal buffer overflow).
- Error in function Simd::BgraToYuv420p.
- Error in function Simd::BgraToYuv422p.
- Error in function Simd::BgraToYuv444p.
- Error in NEON optimizations of class MergedConvolution32fCd.
Test framework
New features
- Tests for verifying functionality of function BgrToYuv420pV2.
- Tests for verifying functionality of function BgrToYuv422pV2.
- Tests for verifying functionality of function BgrToYuv444pV2.
- Tests for verifying functionality of function Yuv420pToRgbV2.
- Tests for verifying functionality of function Yuv422pToRgbV2.
- Tests for verifying functionality of function Yuv444pToRgbV2.
Simd v5.3.129
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgrV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv422pToBgrV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv444pToBgrV2.
- Parameter yuvType in function Simd::Yuv420pToBgr.
- Parameter yuvType in function Simd::Yuv422pToBgr.
- Parameter yuvType in function Simd::Yuv444pToBgr.
- The mark of function SimdYuv420pToBgr as deprecated.
- The mark of function SimdYuv422pToBgr as deprecated.
- The mark of function SimdYuv444pToBgr as deprecated.
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv422pToBgraV2.
- Parameter yuvType in function Simd::Yuv420pToBgra.
- Parameter yuvType in function Simd::Yuv422pToBgra.
- Parameter yuvType in function Simd::Yuv444pToBgra.
- The mark of function SimdYuv420pToBgra as deprecated.
- The mark of function SimdYuv422pToBgra as deprecated.
- The mark of function SimdYuv444pToBgra as deprecated.
- Parameter yuvType in function Simd::BgraToYuv420p.
- Parameter yuvType in function Simd::BgraToYuv422p.
- Parameter yuvType in function Simd::BgraToYuv444p.
- Parameter yuvType in function Simd::BgraToYuva420p.
- The mark of function SimdBgraToYuv420p as deprecated.
- The mark of function SimdBgraToYuv422p as deprecated.
- The mark of function SimdBgraToYuv444p as deprecated.
- The mark of function SimdBgraToYuva420p as deprecated.
- The mark of function SimdResizeBilinear as deprecated.
- The mark of function Simd::ResizeBilinear as deprecated.
- The mark of function Simd::ResizeAreaGray as deprecated.
- The mark of function Simd::ResizeArea as deprecated.
- The mark of function Simd::InterferenceIncrement as deprecated.
- The mark of function Simd::InterferenceIncrementMasked as deprecated.
- The mark of function Simd::InterferenceDecrement as deprecated.
- The mark of function Simd::InterferenceDecrementMasked as deprecated.
- The mark of function SimdSynetFusedLayerForward0 as deprecated.
- The mark of function SimdSynetFusedLayerForward1 as deprecated.
- The mark of function SimdSynetFusedLayerForward2 as deprecated.
- The mark of function SimdSynetFusedLayerForward3 as deprecated.
- The mark of function SimdSynetFusedLayerForward4 as deprecated.
- The mark of function SimdSynetFusedLayerForward8 as deprecated.
- The mark of function SimdSynetFusedLayerForward9 as deprecated.
Bug fixing
- Error in NEON optimizations of Resizer engine.
- The memory reading outside border of input array in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode32f.
- The memory reading outside border of input array in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode16f.
- The memory reading outside border of input array in Base implementation, SSE4.1, AVX2 optimizations of function DescrIntCosineDistance.
- The memory reading outside border of input array in Base implementation, SSE4.1, AVX2 optimizations of function DescrIntCosineDistancesMxNa.
- The memory reading outside border of input array in Base implementation, SSE4.1, AVX2 optimizations of function DescrIntCosineDistancesMxNp.
- Error in AVX-512BW optimizations of function DescrIntEncode32f.
- Error in AVX-512BW optimizations of function DescrIntEncode16f.
- Compiler error in function Simd::ResizeArea.
- Error in SSE4.1, AVX, AVX2, AVX-512BW, NEON optimizations of class MergedConvolution32fCd.
Test framework
New features
- Tests for verifying functionality of function Yuv420pToBgrV2.
- Tests for verifying functionality of function Yuv422pToBgrV2.
- Tests for verifying functionality of function Yuv444pToBgrV2.
- Tests for verifying functionality of function Yuv422pToBgraV2.
- Special test for verifying functionality of function Simd::ResizeAreaGray.
- Special test for verifying functionality of function Simd::ResizeArea.
Simd v5.3.128
Algorithms
New features
- Support of SimdCpuInfoRam in function SimdCpuInfo.
- Support of SimdCpuInfoRam in function Simd::PrintInfo.
- Base implementation of function SimdCpuDesc.
- Base implementation of SynetGridSample2dRef class.
- Base implementation, SSE4.1, AVX2 optimizations of SynetGridSample2d32fBlZ class.
Bug fixing
- Error in AVX-512VNNI optimizations of class SynetMergedConvolution8iCdc.
- Error in AVX-512VNNI optimizations of class SynetMergedConvolution8iCd.
- Error in AVX-512VNNI optimizations of class SynetMergedConvolution8iDc.
- Error (assert) in Base implementation of class ResizerNearest.
- Error in SSE4.1, AVX, AVX2, AVX-512BW optimizations of class SynetMergedConvolution32fCd.
Test framework
New features
- Tests for verifying functionality of SynetGridSample2d engine.
Improving
- WIN32 performance report.
Infrastructure
New features
- Github actions script for CMake (build and test for GCC-13 (instead of GCC-12), Linux).
Documentation
Bug fixing
- Wrong description of function SimdDescrIntInit.
Simd v5.3.127
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntEncode16f.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode16f.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntEncode32f.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntEncode16f.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode32f.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode16f.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistance.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNp.
- Support of 4-bit and 5-bit depth in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of function DescrIntCosineDistancesMxNa.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForwardV3.
Improving
- SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistancesMxNp for 4, 5, 6, 7-bits depth.
- SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistancesMxNa for 4, 5, 6, 7-bits depth.
Bug fixing
- Compiler error in file SimdYuvToBgr.h.
Renaming
- Function DescrIntEncode to DescrIntEncode32f.
- Function DescrIntDecode to DescrIntDecode32f.
Test framework
New features
- Tests for verifying functionality of function DescrIntEncode16f.
- Tests for verifying functionality of function DescrIntDecode16f.
- Tests for verifying functionality of function SynetNormalizeLayerForwardV3
Improving
- WIN32 exception handling.
Infrastructure
Improving
- Host Properties step in Github actions script for MSBuild.
Simd v5.3.126
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntEncode.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntDecode.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistance.
- Base implementation optimizations of function DescrIntVectorNorm.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistancesMxNp.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function DescrIntCosineDistancesMxNa.
- SimdSynetUnaryOperation32fRcp member in enumeration SimdSynetUnaryOperationType.
- Support of SimdSynetUnaryOperation32fRcp in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function SynetUnaryOperation32f.
- SimdSynetUnaryOperation32fNot member in enumeration SimdSynetUnaryOperationType.
- Support of SimdSynetUnaryOperation32fNot in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function SynetUnaryOperation32f.
- Support of SimdSynetUnaryOperation32fNot in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function SynetUnaryOperation32f.
- Helper function Simd::InvertAffineTransform.
Improving
- SSE4.1, AVX, AVX-512BW optimizations of method SynetDeconvolution32fGemmNN::RowToImg.
- AVX-512BW optimizations of function SynetUnaryOperation32f (case of SimdSynetUnaryOperation32fLog, SimdSynetUnaryOperation32fExp).
- AVX-512BW optimizations of function SynetSoftplus32f.
Bug fixing
- Error in AVX2 optimizations of function SynetSoftmaxLayerForward.
- GCC compiler error in file SimdDrawing.hpp (Windows, MinGW).
- GCC compiler error in function Test::FileExists (Windows, MinGW).
- Crash in function SimdSynetDeconvolution32fForward (Linux, GCC-12, GCC-13).
- Crash in function Base::AlgCacheL3 (Windows, MinGW).
- Using of _WIN32 macro instead of WIN32.
Test framework
New features
- Tests for verifying functionality of function DescrIntEncode.
- Tests for verifying functionality of function DescrIntDecode.
- Tests for verifying functionality of function DescrIntCosineDistance.
- Tests for verifying functionality of function DescrIntCosineDistancesMxNp.
- Tests for verifying functionality of function DescrIntCosineDistancesMxNa.
- Test command line argument '-cc' to run CheckCpp tests.
Bug fixing
- Test log messages after test error stoppage (multithreaded run).
- Error in CheckCpp tests.
Infrastructure
New features
- Github actions script for CMake (build and test for MinGW, Windows).
- Github actions script for CMake (build and test for GCC-12, Linux).
Simd v5.2.125
Algorithms
New features
- AVX-512BW, NEON optimizations of function SynetGelu32f.
- SimdConvolutionActivationGelu member in enumeration SimdConvolutionActivationType.
- Support of SynetUnaryOperation32fErf in NEON optimizations of function SynetUnaryOperation32f.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fGemmNN.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fWinograd.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fGemmNT.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fDirectNchw.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fDirectNhwc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fDepthwiseDotProduct.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetConvolution32fNhwcDirect.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX-BF16 optimizations of class SynetConvolution32fBf16Gemm.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX-BF16 optimizations of class SynetConvolution32fBf16Nhwc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetDeconvolution32fGemmNN.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8, NEON optimizations of class SynetConvolution8iGemmNN.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8, NEON optimizations of class SynetConvolution8iNhwcDirect.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8, NEON optimizations of class SynetConvolution8iNhwcDepthwise.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX, AVX2, AVX-512BW, NEON optimizations of class SynetMergedConvolution32fCd.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX, AVX2, AVX-512BW, NEON optimizations of class SynetMergedConvolution32fCdc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX, AVX2, AVX-512BW, NEON optimizations of class SynetMergedConvolution32fDc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cd.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Cdc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX-BF16 optimizations of class SynetMergedConvolution32fBf16Dc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetMergedConvolution8iCd.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetMergedConvolution8iCdc.
- Support of SimdConvolutionActivationGelu in Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI, AMX-INT8 optimizations of class SynetMergedConvolution8iDc.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetNormalizeLayerForwardV2.
Improving
- SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Erf.
- Performance of function Simd::Parallel.
- Using of resize method SimdResizeMethodArea in ImageMatcher::Create (more precise than SimdResizeMethodBilinear).
Bug fixing
- Compiler error in SimdImageMatcher.hpp (using of internal functions).
- Wrong API of function ImageMatcher.Skip().
Test framework
New features
- Tests for verifying functionality of function SynetNormalizeLayerForwardV2.
Infrastructure
Bug fixing
- Github actions script for MSBuild (Restore NuGet packages timeout exit).