[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

mratsim · 2024-08-27T15:24:23Z

This issue replaces #92 with a more concrete plan of action

The current Constantine has good foundations to build and test LLVM IR primitives for x86, ARM, Nvidia, AMDGPU and all other ISAs that support add-with-carry after:

The ultimate goal is to port multi-scalar multiplication to Nvidia GPUs, this will require the following steps:

Field arithmetic

Currently only add/sub/mul are implemented, we need to implement all primitives needed for shortweierstrass jacobian doubling/addition and shortweierstrass projective doubling/addition.
There is no need initially for primitives for serialization/deserialization/validation as those can be done on CPU and copied to GPU, for example sqrt is not on the critical path

constantine/constantine/math/elliptic/ec_shortweierstrass_affine.nim

Lines 105 to 128 in 65147ed

    
           func trySetFromCoordX*[F, G]( 
        
                  P: var EC_ShortW_Aff[F, G], 
        
                  x: F): SecretBool = 
        
             ## Try to create a point the elliptic curve 
        
             ## y² = x³ + a x + b     (affine coordinate) 
        
             ## 
        
             ## 
        
             ## return true and update `P` if `x` leads to a valid point 
        
             ## return false otherwise, in that case `P` is undefined. 
        
             ## 
        
             ## Note: Dedicated robust procedures for hashing-to-curve 
        
             ##       will be provided, this is intended for testing purposes. 
        
             ## 
        
             ##       For **test case generation only**, 
        
             ##       this is preferred to generating random point 
        
             ##       via random scalar multiplication of the curve generator 
        
             ##       as the latter assumes: 
        
             ##       - point addition, doubling work 
        
             ##       - scalar multiplication works 
        
             ##       - a generator point is defined 
        
             ##       i.e. you can't test unless everything is already working 
        
             P.y.curve_eq_rhs(x, G) 
        
             result = sqrt_if_square(P.y) 
        
             P.x = x

Meaning at the very least:

ccopy
square
neg
cneg

I think square can be implemented as nsqr with "virtual" signature proc nsqr(r: var Fp, a: Fp, count: int) that loops from the count to 0 excluded as overhead is minimal and it will be helpful for primitives that need square_repeated (deserialization).

Naming

see this https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/constantine/math_compiler/README.md

Naming convention for internal procedures:

_big_add_u64x4

_finalsub_mayo_u64x4 -> final substraction may overflow

_finalsub_noo_u64x4 -> final sub no overflow

_mod_add_u64x4

_mod_add2x_u64x8 -> FpDbl backend

_mty_mulur_u64x4b2 -> unreduced Montgomery multiplication (unreduced result valid iff 2 spare bits)

_mty_mul_u64x4b1 -> reduced Montgomery multiplication (result valid iff at least 1 spare bit)

_mty_mul_u64x4 -> reduced Montgomery multiplication

_mty_nsqrur_u64x4b2 -> unreduced square n times

_mty_nsqr_u64x4b1 -> reduced square n times

_mty_sqr_u64x4 -> square

_mty_red_u64x4 -> reduction u64x4 <- u64x8

_pmp_red_mayo_u64x4 -> Pseudo-Mersenne Prime partial reduction may overflow (secp256k1)

_pmp_red_noo_u64x4 -> Pseudo-Mersenne Prime partial reduction no overflow

_secp256k1_red -> special reduction

_fp2x_sqr2x_u64x4 -> Fp2 complex, Fp -> FpDbl lazy reduced squaring

_fp2g_sqr2x_u64x4 -> Fp2 generic/non-complex (do we pass the mul-non-residue as parameter?)

_fp2_sqr_u64x4 -> Fp2 (pass the mul-by-non-residue function as parameter)

_fp4o2_mulnr1pi_u64x4 -> Fp4 over Fp2 mul with (1+i) non-residue optimization

_fp4o2_mulbynr_u64x4

_fp12_add_u64x4

_fp12o4o2_mul_u64x4 -> Fp12 over Fp4 over Fp2

_ecg1swjac_adda0_u64x4 -> Shortweierstrass G1 jacobian addition a=0

_ecg1swjac_add_u64x4_var -> Shortweierstrass G1 jacobian vartime addition

_ectwprj_add_u64x4 -> Twisted Edwards Projective addition

Elliptic curve arithmetic

Similar to the field descriptor, an EcDescriptor will likely be needed

constantine/constantine/math_compiler/ir.nim

Lines 142 to 160 in 65147ed

    
           type 
        
             FieldDescriptor* = object 
        
               name*: string 
        
               modulus*: string # Modulus as Big-Endian uppercase hex, NOT prefixed with 0x 
        
               # primeKind*: PrimeKind 
        
               # Word: i32, i64 but can also be v4i32, v16i32 ... 
        
               wordTy*: TypeRef 
        
               word2xTy*: TypeRef # Double the word size 
        
               v*, w*: uint32 
        
               numWords*: uint32 
        
               zero*, zero_i1*: ValueRef 
        
               intBufTy*: TypeRef # int type, multiple of the word size, that can store the field elements 
        
                                  # For example a 381 bit field is stored in 384-bit ints (whether on 32 or 64-bit platforms) 
        
               # Field metadata 
        
               fieldTy*: TypeRef 
        
               bits*: uint32 
        
               spareBits*: uint8

The following algebraic object configurations in SageMath and Nim Macro and the curve-specific calls will help guide what the descriptor should hold:

constantine/sage/curves.sage

Lines 123 to 137 in 65147ed

    
           'BLS12_381': { 
        
             'field': derive_BLS12_field(-(2^63 + 2^62 + 2^60 + 2^57 + 2^48 + 2^16)), 
        
             'curve': { 
        
               'form': 'short_weierstrass', 
        
               'a': 0, 
        
               'b': 4 
        
             }, 
        
             'tower': { 
        
               'embedding_degree': 12, 
        
               'twist_degree': 6, 
        
               'QNR_Fp': -1, 
        
               'SNR_Fp2': [1, 1], 
        
               'twist': 'M_Twist' 
        
             } 
        
           },

constantine/constantine/named/config_fields_and_curves.nim

Lines 252 to 270 in 65147ed

    
           curve BLS12_381: 
        
             bitwidth: 381 
        
             modulus: "0x1a0111ea397fe69a4b1ba7b6434bacd764774b84f38512bf6730d2a0f6b0f6241eabfffeb153ffffb9feffffffffaaab" 
        
             family: BarretoLynnScott 
        
             # u: -(2^63 + 2^62 + 2^60 + 2^57 + 2^48 + 2^16) 
        
             # G1 Equation: y² = x³ + 4 
        
             # G2 Equation: y² = x³ + 4 (1+i) 
        
             order: "0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001" 
        
             orderBitwidth: 255 
        
             cofactor: "0x396c8c005555e1568c00aaab0000aaab" 
        
             eq_form: ShortWeierstrass 
        
             coef_a: 0 
        
             coef_b: 4 
        
             nonresidue_fp: -1       #      -1   is not a square in 𝔽p 
        
             nonresidue_fp2: (1, 1)  # 1+𝑖   1+𝑖  is not a square or cube in 𝔽p² 
        
             embedding_degree: 12 
        
             sexticTwist: M_Twist

Jacobian coordinates are probably slightly easier as they don't depends on the curve equation parameters (a, b) in y² = x³ + ax + b.
It is fine to focus on curves with a == 0 as all curves used for Ethereum (secp256k1, BN254_Snarks and BLS12_381) are in that case.
When there is a dependency on the curve equation, I am undecided yet whether to materialize b as a global in LLVM IR and let the optimizer do its job or doing directly in the code generator.

Addition, mixed addition and doublings are the priority. Due to GPU constraint, will start with the constant-time versions.

The text was updated successfully, but these errors were encountered:

mratsim added enhancement New feature or request performance 🏁 labels Aug 27, 2024

mratsim assigned Vindaar Aug 27, 2024

mratsim mentioned this issue Aug 27, 2024

GPU backends #92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

mratsim commented Aug 27, 2024

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

[GPU] GPU / LLVM IR Elliptic curves implementation plan #465

Comments

mratsim commented Aug 27, 2024

Field arithmetic

Naming

Elliptic curve arithmetic