simd support for fp16/bf16 #723

cqy123456 · 2024-07-23T09:16:39Z

issue: #287
some test result with vdbench:(768d-1m,hnsw)
fp32:

fp16:

bf16:

fp16 before opt:

sre-ci-robot · 2024-07-23T09:16:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cqy123456

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cqy123456]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mergify · 2024-07-23T09:17:18Z

@cqy123456 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

If you're fixing a bug, label it as kind/bug.
For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

codecov · 2024-07-23T10:11:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.97%. Comparing base (3c46f4c) to head (0f3d957).
Report is 128 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##           main     #723       +/-   ##
=========================================
+ Coverage      0   71.97%   +71.97%     
=========================================
  Files         0       70       +70     
  Lines         0     5160     +5160     
=========================================
+ Hits          0     3714     +3714     
- Misses        0     1446     +1446

see 70 files with indirect coverage changes

cydrain · 2024-07-23T11:05:52Z

any test to see the performance improvement before and after this code change ?

foxspy · 2024-07-23T12:23:13Z

/lgtm

alexanderguzhva · 2024-07-23T14:28:21Z

cmake/utils/compile_flags.cmake

@@ -18,7 +18,7 @@ endif()
 set(CMAKE_CXX_FLAGS "-Wall -fPIC ${CMAKE_CXX_FLAGS}")

 if(__X86_64)
-  set(CMAKE_CXX_FLAGS "-msse4.2 ${CMAKE_CXX_FLAGS}")
+  set(CMAKE_CXX_FLAGS "-mf16c -msse4.2 ${CMAKE_CXX_FLAGS}")


F16C instruction set requires at least Intel Ivy Bridge CPU (more details at https://en.wikipedia.org/wiki/F16C). May it affect any clients? @liliu-z

bool cpu_support_f16c() { InstructionSet& instruction_set_inst = InstructionSet::GetInstance(); return (instruction_set_inst.F16C()); } hook will check whether has f16c.

alexanderguzhva · 2024-07-23T15:07:33Z

src/simd/distances_avx.cc

@@ -55,6 +55,72 @@ fvec_inner_product_avx(const float* x, const float* y, size_t d) {
 }
 FAISS_PRAGMA_IMPRECISE_FUNCTION_END

+float
+fp16_vec_inner_product_avx(const knowhere::fp16* x, const knowhere::fp16* y, size_t d) {
+    __m256 m_res = _mm256_setzero_ps();


I'd call ones msum_0 and msum_1 for the consistency of the naming style, here and in all other functions.

alexanderguzhva · 2024-07-23T15:12:52Z

src/simd/distances_avx512.cc

+        y += 16;
+        d -= 16;
+    }
+    float sum = _mm512_reduce_add_ps(m512_res);


Use masking.

while (d >= 32) {...} if (d >= 16) {...} if (d > 0) { const __mmask16 mask = (1U << d) - 1U; auto mx = _mm512_cvtph_ps(_mm256_maskz_load_epi16(mask, x)); auto my = _mm512_cvtph_ps(_mm256_maskz_load_epi16(mask, y)); mx = _mm512_sub_ps(mx, my); m512_res = _mm512_fmadd_ps(mx, mx, m512_res); } return _mm512_reduce_add_ps(m512_res);

Please add -mavx512vl to AVX512 CMake settings. It is safe to do, it won't affect our list of accepted CPU generations.

The comment applies to all proposed functions for AVX512.

alexanderguzhva · 2024-07-23T15:16:52Z

src/simd/distances_neon.cc

+float
+fp16_vec_inner_product_neon(const knowhere::fp16* x, const knowhere::fp16* y, size_t d) {
+    float32x4x4_t res = {
+        {{0.0f, 0.0f, 0.0f, 0.0f}, {0.0f, 0.0f, 0.0f, 0.0f}, {0.0f, 0.0f, 0.0f, 0.0f}, {0.0f, 0.0f, 0.0f, 0.0f}}};


{vdupq_n_f32(0.0f), vdupq_n_f32(0.0f), vdupq_n_f32(0.0f), vdupq_n_f32(0.0f)} is shorter

alexanderguzhva · 2024-07-23T15:33:57Z

src/simd/distances_neon.cc

@@ -81,6 +83,134 @@ fvec_inner_product_neon(const float* x, const float* y, size_t d) {
    return vaddvq_f32(sum_);
 }

+float
+fp16_vec_inner_product_neon(const knowhere::fp16* x, const knowhere::fp16* y, size_t d) {
+    float32x4x4_t res = {


if you have time, please check whether the following function is faster than yours. At least, I see that clang produced a small and reliable code and gcc produces some meaningful code as well.

FAISS_PRAGMA_IMPRECISE_FUNCTION_BEGIN float fp16_vec_inner_product_neon(const knowhere::fp16* x_in, const knowhere::fp16* y_in, size_t d) { const __fp16* x = reinterpret_cast<const __fp16*>(x_in); const __fp16* y = reinterpret_cast<const __fp16*>(y_in); float sum = 0; FAISS_PRAGMA_IMPRECISE_LOOP for (size_t i = 0; i < d; i++) { sum += (float)(x[i]) * (float)(y[i]); } return sum; } FAISS_PRAGMA_IMPRECISE_FUNCTION_END

If not, then the code is fine.
You may wish to try the same trust the compiler approach for all other fp16-based functions for NEON.

This won't work for bf16 unfortunately :( because of clang crashes

i use random data(dim = 768, nb = 100000, nq = 100) to search hnsw index:
main branch takes : 64ms;
this branch takes : 13ms;
trust the compiler takes: 18ms

well, maybe gcc is not that smart enough yet :)

alexanderguzhva · 2024-07-23T16:01:16Z

tests/ut/test_simd.cc

+
+TEST_CASE("Test fp16 distance", "[fp16]") {
+    using Catch::Approx;
+    auto dim = GENERATE(as<size_t>{}, 1, 2, 10, 69, 128, 141, 510, 1024);


I'd test 1,2,4,5,10,13,21,29 as well
Testing high dimensionalities, larger than 96 or 128 does not make sense to me, because the code contains dimensionality granularities and if-then-else / while constructions for like values 32, 16, 8, 4.

hhy3 · 2024-07-24T02:52:52Z

src/simd/distances_avx.h

@@ -12,6 +12,8 @@
 #ifndef DISTANCES_AVX_H
 #define DISTANCES_AVX_H

+#include <knowhere/operands.h>


use #include "" instead of #include <>

hhy3 · 2024-07-24T02:55:17Z

src/simd/distances_neon.h

@@ -12,6 +12,8 @@
 #ifndef DISTANCES_NEON_H
 #define DISTANCES_NEON_H

+#include <knowhere/operands.h>


hhy3 · 2024-07-24T03:00:58Z

src/simd/distances_avx.cc

@@ -55,6 +55,72 @@ fvec_inner_product_avx(const float* x, const float* y, size_t d) {
 }
 FAISS_PRAGMA_IMPRECISE_FUNCTION_END

+float
+fp16_vec_inner_product_avx(const knowhere::fp16* x, const knowhere::fp16* y, size_t d) {


seems no need to add knowhere:: here

simd dir still in faiss namespace

alexanderguzhva · 2024-07-24T13:58:15Z

/lgtm

alexanderguzhva · 2024-07-25T11:36:21Z

Please also change _mm256_load_si256 into _mm256_loadu_si256

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>

alexanderguzhva · 2024-07-25T12:33:50Z

/lgtm

sre-ci-robot requested review from cydrain and hhy3 July 23, 2024 09:16

sre-ci-robot added approved size/XXL labels Jul 23, 2024

mergify bot added the dco-passed label Jul 23, 2024

mergify bot added the do-not-merge/missing-related-issue label Jul 23, 2024

cqy123456 force-pushed the 16-simd branch 2 times, most recently from 3761010 to 9760f4e Compare July 23, 2024 09:54

sre-ci-robot assigned foxspy Jul 23, 2024

sre-ci-robot added the lgtm label Jul 23, 2024

mergify bot added the ci-passed label Jul 23, 2024

alexanderguzhva reviewed Jul 23, 2024

View reviewed changes

hhy3 reviewed Jul 24, 2024

View reviewed changes

cqy123456 force-pushed the 16-simd branch from 9760f4e to b2b679b Compare July 24, 2024 05:20

sre-ci-robot removed the lgtm label Jul 24, 2024

mergify bot added ci-passed and removed ci-passed labels Jul 24, 2024

sre-ci-robot assigned alexanderguzhva Jul 24, 2024

sre-ci-robot added the lgtm label Jul 24, 2024

cqy123456 force-pushed the 16-simd branch from b2b679b to fed2428 Compare July 25, 2024 03:26

sre-ci-robot removed the lgtm label Jul 25, 2024

mergify bot removed the ci-passed label Jul 25, 2024

cqy123456 force-pushed the 16-simd branch from fed2428 to 11ebab8 Compare July 25, 2024 04:15

mergify bot added the ci-passed label Jul 25, 2024

cqy123456 force-pushed the 16-simd branch from 11ebab8 to b08bddc Compare July 25, 2024 08:46

mergify bot added ci-passed and removed ci-passed labels Jul 25, 2024

cqy123456 force-pushed the 16-simd branch from b08bddc to 969a08d Compare July 25, 2024 10:54

mergify bot removed the ci-passed label Jul 25, 2024

mergify bot added the ci-passed label Jul 25, 2024

simd support for fp16/bf16

0f3d957

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>

cqy123456 force-pushed the 16-simd branch from 969a08d to 0f3d957 Compare July 25, 2024 12:05

mergify bot removed the ci-passed label Jul 25, 2024

sre-ci-robot added the lgtm label Jul 25, 2024

cqy123456 removed the do-not-merge/missing-related-issue label Jul 25, 2024

mergify bot added do-not-merge/missing-related-issue ci-passed labels Jul 25, 2024

cqy123456 added the kind/improvement label Jul 25, 2024

mergify bot removed the do-not-merge/missing-related-issue label Jul 25, 2024

sre-ci-robot merged commit 9e37ebb into zilliztech:main Jul 25, 2024
13 checks passed

alexanderguzhva mentioned this pull request Jul 25, 2024

Build fp16 hnsw crash #728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simd support for fp16/bf16 #723

simd support for fp16/bf16 #723

cqy123456 commented Jul 23, 2024 •

edited

Loading

sre-ci-robot commented Jul 23, 2024

mergify bot commented Jul 23, 2024

codecov bot commented Jul 23, 2024 •

edited

Loading

cydrain commented Jul 23, 2024

foxspy commented Jul 23, 2024

alexanderguzhva Jul 23, 2024

cqy123456 Jul 24, 2024

alexanderguzhva Jul 23, 2024

cqy123456 Jul 24, 2024

alexanderguzhva Jul 23, 2024

cqy123456 Jul 24, 2024

alexanderguzhva Jul 23, 2024

cqy123456 Jul 24, 2024

alexanderguzhva Jul 23, 2024

cqy123456 Jul 24, 2024

alexanderguzhva Jul 24, 2024

alexanderguzhva Jul 23, 2024

hhy3 Jul 24, 2024

cqy123456 Jul 24, 2024

hhy3 Jul 24, 2024

cqy123456 Jul 24, 2024

hhy3 Jul 24, 2024

cqy123456 Jul 24, 2024

alexanderguzhva commented Jul 24, 2024

alexanderguzhva commented Jul 25, 2024

alexanderguzhva commented Jul 25, 2024

simd support for fp16/bf16 #723

simd support for fp16/bf16 #723

Conversation

cqy123456 commented Jul 23, 2024 • edited Loading

sre-ci-robot commented Jul 23, 2024

mergify bot commented Jul 23, 2024

codecov bot commented Jul 23, 2024 • edited Loading

Codecov Report

cydrain commented Jul 23, 2024

foxspy commented Jul 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexanderguzhva commented Jul 24, 2024

alexanderguzhva commented Jul 25, 2024

alexanderguzhva commented Jul 25, 2024

cqy123456 commented Jul 23, 2024 •

edited

Loading

codecov bot commented Jul 23, 2024 •

edited

Loading