Fix fuzz 1 failure: incorrect reduction of BigInt #246

mratsim · 2023-07-02T08:06:04Z

From the Ethereum Foundation sponsored fuzzing by @guidovranken (linked to #54),
fromBig can lead to incorrectly reduced field elements.

This is because classic Montgomery reduction can reduce inputs from the range [0, 4p²) -> [0, p) but this is not the case with the no-carry optimization from Gnark (https://hackmd.io/@gnark/modular_multiplication and https://eprint.iacr.org/2022/1400.pdf section 2)

In practice getMont and fromBig are internal-only procedures that only deserialized well-formed inputs that have been pre-checked in protocols:

constantine/constantine/ethereum_bls_signatures.nim

Lines 284 to 293 in 151f284

    
           # General case 
        
           var t{.noInit.}: matchingBigInt(BLS12_381) 
        
           t.unmarshal(src, bigEndian) 
        
           t.limbs[t.limbs.len-1] = t.limbs[t.limbs.len-1] and (MaxWord shr 3) # The first 3 bytes contain metadata to mask out 
        
           if bool(t >= BLS12_381.Mod()): 
        
             return cttBLS_CoordinateGreaterOrEqualThanModulus 
        
           var x{.noInit.}: Fp[BLS12_381] 
        
           x.fromBig(t)

constantine/constantine/ethereum_bls_signatures.nim

Lines 339 to 354 in 151f284

    
           # General case 
        
           var t{.noInit.}: matchingBigInt(BLS12_381) 
        
           t.unmarshal(src.toOpenArray(0, 48-1), bigEndian) 
        
           t.limbs[t.limbs.len-1] = t.limbs[t.limbs.len-1] and (MaxWord shr 3) # The first 3 bytes contain metadata to mask out 
        
           if bool(t >= BLS12_381.Mod()): 
        
             return cttBLS_CoordinateGreaterOrEqualThanModulus 
        
           var x{.noInit.}: Fp2[BLS12_381] 
        
           x.c1.fromBig(t) 
        
           t.unmarshal(src.toOpenArray(48, 96-1), bigEndian) 
        
           if bool(t >= BLS12_381.Mod()): 
        
             return cttBLS_CoordinateGreaterOrEqualThanModulus 
        
           x.c0.fromBig(t)

constantine/constantine/ethereum_evm_precompiles.nim

Lines 38 to 54 in 151f284

    
           func parseRawUint( 
        
                  dst: var Fp[BN254_Snarks], 
        
                  src: openarray[byte]): CttEVMStatus = 
        
             ## Parse an unsigned integer from its canonical 
        
             ## big-endian or little-endian unsigned representation 
        
             ## And store it into a field element. 
        
             ## 
        
             ## Return false if the integer is larger than the field modulus. 
        
             ## Returns true on success. 
        
             var big {.noInit.}: BigInt[254] 
        
             big.unmarshal(src, bigEndian) 
        
             if not bool(big < Mod(BN254_Snarks)): 
        
               return cttEVM_IntLargerThanModulus 
        
             dst.fromBig(big) 
        
             return cttEVM_Success

and for large inputs, we use redc2xMont + multMont:

constantine/constantine/hash_to_curve/h2c_hash_to_field.nim

Lines 222 to 235 in 151f284

    
           # Reduces modulo p and output in Montgomery domain 
        
           when m == 1: 
        
             output[i].redc2x(big2x) 
        
             output[i].mres.mulMont( 
        
               output[i].mres, 
        
               Fp[Field.C].getR3ModP(), 
        
               Fp[Field.C]) 
        
           else: 
        
             output[i].coords[j].redc2x(big2x) 
        
             output[i].coords[j].mres.mulMont( 
        
               output[i].coords[j].mres, 
        
               Fp[Field.C].getR3ModP(), 
        
               Fp[Field.C])

constantine/constantine/ethereum_eip2333_bls12381_key_derivation.nim

Lines 56 to 71 in 151f284

    
           #  7. x = OS2IP(OKM) mod r 
        
           #  We reduce mod r via Montgomery reduction, instead of bigint division 
        
           #  as constant-time division works bits by bits (384 bits) while 
        
           #  Montgomery reduction works word by word, quadratically so 6*6 = 36 on 64-bit CPUs. 
        
           #  With R ≡ (2^WordBitWidth)^numWords (mod M) 
        
           #  redc2xMont(a) computes a/R 
        
           #  mulMont(a, b) computes a.b.R⁻¹ 
        
           var seckeyDbl{.noInit.}: BigInt[2 * BLS12_381.getCurveOrderBitWidth()] 
        
           seckeyDbl.unmarshal(okm, bigEndian) 
        
           # secretKey.reduce(seckeyDbl, BLS12_381.getCurveOrder()) 
        
           secretKey.limbs.redc2xMont(seckeyDbl.limbs,                                      # seckey/R 
        
                                      BLS12_381.getCurveOrder().limbs, Fr[BLS12_381].getNegInvModWord(), 
        
                                      Fr[BLS12_381].getSpareBits()) 
        
           secretKey.limbs.mulMont(secretKey.limbs, Fr[BLS12_381].getR2modP().limbs,        # (seckey/R) * R² * R⁻¹ = seckey 
        
                                   BLS12_381.getCurveOrder().limbs, Fr[BLS12_381].getNegInvModWord(), 
        
                                   Fr[BLS12_381].getSpareBits())

or full reduction:

constantine/constantine/math_arbitrary_precision/arithmetic/bigints_views.nim

Lines 69 to 79 in 151f284

    
           # Conversion to Montgomery can auto-reduced by up to M*R 
        
           # if we use redc2xMont (a/R) and montgomery multiplication by R³ 
        
           # For now, we call explicit reduction as it can handle all sizes. 
        
           # TODO: explicit reduction uses constant-time division which is **very** expensive 
        
           # TODO: fix https://github.com/mratsim/constantine/issues/241 
        
           if a.len != M.len: 
        
             let t = allocStackArray(SecretWord, L) 
        
             t.LimbsViewMut.reduce(a.view(), aBits, M.view(), mBits) 
        
             rMont.LimbsViewMut.getMont(LimbsViewConst t, M.view(), LimbsViewConst r2.view(), m0ninv, mBits) 
        
           else: 
        
             rMont.LimbsViewMut.getMont(a.view(), M.view(), LimbsViewConst r2.view(), m0ninv, mBits)

However there is one protocol case where getMont is used, in ECMUL (EIP-198):

constantine/constantine/ethereum_evm_precompiles.nim

Lines 189 to 209 in 151f284

    
           var smod{.noInit.}: Fr[BN254_Snarks] 
        
           var s{.noInit.}: BigInt[256] 
        
           s.unmarshal(padded.toOpenArray(64,95), bigEndian) 
        
           when true: 
        
             # The spec allows s to be bigger than the curve order r and the field modulus p. 
        
             # As, elliptic curve are a cyclic group mod r, we can reduce modulo r and get the same result. 
        
             # This allows to use windowed endomorphism acceleration 
        
             # which is 31.5% faster than plain windowed scalar multiplication 
        
             # at the low cost of a modular reduction. 
        
             # Due to mismatch between the BigInt[256] input and the rest being BigInt[254] 
        
             # we use the low-level getMont instead of 'fromBig' 
        
             getMont(smod.mres.limbs, s.limbs, 
        
                         Fr[BN254_Snarks].fieldMod().limbs, 
        
                         Fr[BN254_Snarks].getR2modP().limbs, 
        
                         Fr[BN254_Snarks].getNegInvModWord(), 
        
                         Fr[BN254_Snarks].getSpareBits()) 
        
             P.scalarMul(smod.toBig()) 
        
           else: 
        
             P.scalarMul(s)

Perf

The impact on BN254 is a slowdown from 10ns to 17ns (with Clang, worse with GCC)

The impact on BLS12-381 is a slowdown from 23ns to 39ns (with Clang, worse with GCC)

Most of the slowdown is due to classic Montgomery reduction not having an assembly implementation.

However, that conversion is not in the hot path of any protocol, most protocols that need to deal with a large amount of deserialization allow caching, for example in case of Ethereum validator public keys. And 99% of the time is spend in computing the square root of compressed keys/signatures.

Fix fuzz #1 failure: incorrect reduction of BigInt

e0f94f7

mratsim added the correctness 🛂 label Jul 2, 2023

mratsim merged commit d0f4ad8 into master Jul 2, 2023
12 checks passed

mratsim deleted the fuzz-1-reduction-field branch July 2, 2023 15:15

mratsim mentioned this pull request Jul 16, 2023

[Fuzz fail] MSM BLS12-381- GCC-only no-ADX assembly fallback #248

Open

guidovranken mentioned this pull request Sep 27, 2023

More ZK bugs 0xPARC/zk-bug-tracker#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fuzz 1 failure: incorrect reduction of BigInt #246

Fix fuzz 1 failure: incorrect reduction of BigInt #246

mratsim commented Jul 2, 2023

	# General case
	var t{.noInit.}: matchingBigInt(BLS12_381)
	t.unmarshal(src, bigEndian)
	t.limbs[t.limbs.len-1] = t.limbs[t.limbs.len-1] and (MaxWord shr 3) # The first 3 bytes contain metadata to mask out

	if bool(t >= BLS12_381.Mod()):
	return cttBLS_CoordinateGreaterOrEqualThanModulus

	var x{.noInit.}: Fp[BLS12_381]
	x.fromBig(t)

	# General case
	var t{.noInit.}: matchingBigInt(BLS12_381)
	t.unmarshal(src.toOpenArray(0, 48-1), bigEndian)
	t.limbs[t.limbs.len-1] = t.limbs[t.limbs.len-1] and (MaxWord shr 3) # The first 3 bytes contain metadata to mask out

	if bool(t >= BLS12_381.Mod()):
	return cttBLS_CoordinateGreaterOrEqualThanModulus

	var x{.noInit.}: Fp2[BLS12_381]
	x.c1.fromBig(t)

	t.unmarshal(src.toOpenArray(48, 96-1), bigEndian)
	if bool(t >= BLS12_381.Mod()):
	return cttBLS_CoordinateGreaterOrEqualThanModulus

	x.c0.fromBig(t)

	func parseRawUint(
	dst: var Fp[BN254_Snarks],
	src: openarray[byte]): CttEVMStatus =
	## Parse an unsigned integer from its canonical
	## big-endian or little-endian unsigned representation
	## And store it into a field element.
	##
	## Return false if the integer is larger than the field modulus.
	## Returns true on success.
	var big {.noInit.}: BigInt[254]
	big.unmarshal(src, bigEndian)

	if not bool(big < Mod(BN254_Snarks)):
	return cttEVM_IntLargerThanModulus

	dst.fromBig(big)
	return cttEVM_Success

	# Reduces modulo p and output in Montgomery domain
	when m == 1:
	output[i].redc2x(big2x)
	output[i].mres.mulMont(
	output[i].mres,
	Fp[Field.C].getR3ModP(),
	Fp[Field.C])

	else:
	output[i].coords[j].redc2x(big2x)
	output[i].coords[j].mres.mulMont(
	output[i].coords[j].mres,
	Fp[Field.C].getR3ModP(),
	Fp[Field.C])

	# 7. x = OS2IP(OKM) mod r
	# We reduce mod r via Montgomery reduction, instead of bigint division
	# as constant-time division works bits by bits (384 bits) while
	# Montgomery reduction works word by word, quadratically so 6*6 = 36 on 64-bit CPUs.
	# With R ≡ (2^WordBitWidth)^numWords (mod M)
	# redc2xMont(a) computes a/R
	# mulMont(a, b) computes a.b.R⁻¹
	var seckeyDbl{.noInit.}: BigInt[2 * BLS12_381.getCurveOrderBitWidth()]
	seckeyDbl.unmarshal(okm, bigEndian)
	# secretKey.reduce(seckeyDbl, BLS12_381.getCurveOrder())
	secretKey.limbs.redc2xMont(seckeyDbl.limbs, # seckey/R
	BLS12_381.getCurveOrder().limbs, Fr[BLS12_381].getNegInvModWord(),
	Fr[BLS12_381].getSpareBits())
	secretKey.limbs.mulMont(secretKey.limbs, Fr[BLS12_381].getR2modP().limbs, # (seckey/R) * R² * R⁻¹ = seckey
	BLS12_381.getCurveOrder().limbs, Fr[BLS12_381].getNegInvModWord(),
	Fr[BLS12_381].getSpareBits())

	# Conversion to Montgomery can auto-reduced by up to M*R
	# if we use redc2xMont (a/R) and montgomery multiplication by R³
	# For now, we call explicit reduction as it can handle all sizes.
	# TODO: explicit reduction uses constant-time division which is very expensive
	# TODO: fix https://github.com/mratsim/constantine/issues/241
	if a.len != M.len:
	let t = allocStackArray(SecretWord, L)
	t.LimbsViewMut.reduce(a.view(), aBits, M.view(), mBits)
	rMont.LimbsViewMut.getMont(LimbsViewConst t, M.view(), LimbsViewConst r2.view(), m0ninv, mBits)
	else:
	rMont.LimbsViewMut.getMont(a.view(), M.view(), LimbsViewConst r2.view(), m0ninv, mBits)

	var smod{.noInit.}: Fr[BN254_Snarks]
	var s{.noInit.}: BigInt[256]
	s.unmarshal(padded.toOpenArray(64,95), bigEndian)

	when true:
	# The spec allows s to be bigger than the curve order r and the field modulus p.
	# As, elliptic curve are a cyclic group mod r, we can reduce modulo r and get the same result.
	# This allows to use windowed endomorphism acceleration
	# which is 31.5% faster than plain windowed scalar multiplication
	# at the low cost of a modular reduction.

	# Due to mismatch between the BigInt[256] input and the rest being BigInt[254]
	# we use the low-level getMont instead of 'fromBig'
	getMont(smod.mres.limbs, s.limbs,
	Fr[BN254_Snarks].fieldMod().limbs,
	Fr[BN254_Snarks].getR2modP().limbs,
	Fr[BN254_Snarks].getNegInvModWord(),
	Fr[BN254_Snarks].getSpareBits())
	P.scalarMul(smod.toBig())
	else:
	P.scalarMul(s)

Fix fuzz 1 failure: incorrect reduction of BigInt #246

Fix fuzz 1 failure: incorrect reduction of BigInt #246

Conversation

mratsim commented Jul 2, 2023

Perf