Vectorized table select #77

mratsim · 2020-08-24T20:01:21Z

The CMOV instruction that is used for conditional copy is likely optimal for 4~6 limbs.

From Agner Fog tables

https://www.agner.org/optimize/instruction_tables.pdf

The throughput is 0.5 hence 2 independent CMOV can be issued per cycle, hence 2-3 cycles are required per Fp element.

However when we have a table precomputed for scalar multiplication/signing with 8 EC elements, each composed of 3 Fp coordinates of 4-6 limbs, using SSE or AVX we can load 2x4 or 2x8 limbs per cycle (2 vector loads per cycle, bottlenecked by memory speed).

This would reduce the overhead of table access. Note that LSB set recoding (#73) uses table with 64 to 256 EC elements (192+ Fp hence thousands of limbs)

i.e. to vectorize:

constantine/constantine/elliptic/ec_endomorphism_accel.nim

Lines 200 to 206 in 00ff599

    
           func secretLookup[T](dst: var T, table: openArray[T], index: SecretWord) = 
        
             ## Load a table[index] into `dst` 
        
             ## This is constant-time, whatever the `index`, its value is not leaked 
        
             ## This is also protected against cache-timing attack by always scanning the whole table 
        
             for i in 0 ..< table.len: 
        
               let selector = SecretWord(i) == index 
        
               dst.ccopy(table[i], selector)

mratsim added constant time ⏳ Enhancement is suitable for secret data performance 🏁 labels Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized table select #77

Vectorized table select #77

mratsim commented Aug 24, 2020

Vectorized table select #77

Vectorized table select #77

Comments

mratsim commented Aug 24, 2020