Remove broken and unused code path from L2 normalization #166

griwodz · 2024-08-13T13:15:17Z

Description

Remove an optional and apparently unused code path that relies on normf() to perform L2 normalization of the descriptors.

First, that code is buggy. Second, that code would be slower than the more "old-fashioned" code.

Features list

normf code along with CMake variable PopSift_USE_NORMF removed.
The PR has 2 commits. The first commit fixes the bug but leaves the code in PopSift. Interested people could dig it up and use it. The second commit removes it entirely.

Implementation remarks

There was a bug in src/popsift/s_desc_norm_l2.h in an ifdef'd code path where the L2 norm incorrectly is computed.

It was actually a double bug. First, normf(array) is computed, which is sqrt(sum(array[n]²)) instead of rnormf(array), which would be 1/sqrt(sum(array[n]²)). After that, the min value is multiplied with it instead of the descriptor value. Really stupid bug that would be easily fixed.

However, nobody ever reported the bug because the code wasn't active by default. normf() is only used in PopSift if it is manually selected in CMake, and if the CC is >=7.5.

After learning more and more about CUDA, we know also that the code path would have removed parallelism. It was written under the belief the normf() uses several CUDA warps in the background, forming the equivalent of the reduce() operation that is performed with shuffle_down in the main code path. However, this is not the case.

simogasp

just a minor comment

simogasp · 2024-08-14T14:52:46Z

src/popsift/s_desc_norm_l2.h

    float norm;

-    if( threadIdx.x == 0 ) {
-        norm = normf( 128, src_desc );
-    }
-    __syncthreads();
-    norm = popsift::shuffle( norm, 0 );
-
-    descr.x = min( descr.x, 0.2f*norm );
-    descr.y = min( descr.y, 0.2f*norm );
-    descr.z = min( descr.z, 0.2f*norm );
-    descr.w = min( descr.w, 0.2f*norm );
-
+    // 32 threads compute 4 squares each, then shuffle to performing a addition by
+    // reduction for the sum of 128 squares, result in thread 0
    norm = descr.x * descr.x


float norm = descr.x * descr.x ...

and remove the previous declaration

I don't think that I can do it because the first assignment to norm is inside the if (line 61). Thread 0 is initialized inside the if (lines 60-62), and the shuffle in line 64 initializes the other 31 threads.

The thing that actually happens underneath the C-like syntax is that the result of normf() is stored in the lowest 32 bits of a 1024-bit SIMD register (line 61). Shuffle is a single SIMD instruction that copies the lowest 32 bits of the SIMD register into every other set of 32 bits of the same register.

Technically, I could write
float norm = ( threadIdx.x == 0 ) ? normf( 128, src_desc ) : 0;
That looks nicer, but if would actually waste time.

[bugfix] fix bug in L2 normalization

8db589f

griwodz linked an issue Aug 13, 2024 that may be closed by this pull request

[bug] incorrect L2 normalization if CUDA normf() is in use #165

Closed

griwodz self-assigned this Aug 13, 2024

griwodz added prio:minor cuda issues related to cuda versions bugfix labels Aug 13, 2024

remove broken and unused normf code

94a2c64

griwodz force-pushed the dev/fixL2Normalization branch from bd4d957 to 94a2c64 Compare August 13, 2024 13:17

griwodz changed the title ~~Dev/fix l2 normalization~~ [WIP] Remove broken and unused code path from L2 normalization Aug 13, 2024

griwodz added in progress ready and removed in progress labels Aug 13, 2024

griwodz requested a review from simogasp August 14, 2024 11:29

griwodz changed the title ~~[WIP] Remove broken and unused code path from L2 normalization~~ Remove broken and unused code path from L2 normalization Aug 14, 2024

griwodz requested a review from fabiencastan August 14, 2024 11:41

simogasp approved these changes Aug 14, 2024

View reviewed changes

griwodz merged commit 8623b69 into develop Aug 14, 2024
6 checks passed

griwodz deleted the dev/fixL2Normalization branch August 14, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove broken and unused code path from L2 normalization #166

Remove broken and unused code path from L2 normalization #166

griwodz commented Aug 13, 2024

simogasp left a comment

simogasp Aug 14, 2024

griwodz Aug 14, 2024

Remove broken and unused code path from L2 normalization #166

Remove broken and unused code path from L2 normalization #166

Conversation

griwodz commented Aug 13, 2024

Description

Features list

Implementation remarks

simogasp left a comment

Choose a reason for hiding this comment

simogasp Aug 14, 2024

Choose a reason for hiding this comment

griwodz Aug 14, 2024

Choose a reason for hiding this comment