Multi-Scalar-Multiplication / Linear combination #220

mratsim · 2023-02-15T23:20:05Z

Overview

This implements fast single-threaded (for now) multi-scalar-multiplication (MSM).

As mentioned in this EF presentation from 2020, the ecosystem has found techniques to remove FFTs altogether, in fact, Ethereum KZG polynomial commitment for EIP4844 used to require FFT in 2020 (#151) but not at all in 2023 (https://github.com/ethereum/consensus-specs/blob/59129e4/specs/deneb/polynomial-commitments.md) thanks to creating the polynomial in a specific way. And now 99% of time of any ZK system is spent in MSM (also called FLC for Fast Linear Combination in the slide)

The techniques used have been developed for easy porting to multi-cores and also GPU architectures.

Content

This PR goes a little all over the place.

While trying to find a signed digit representation to slice the MSM without precomputation hence suitable for GPUs (unlike NAF which needs a carry), I found interesting signed digits for the pairings' Miller Loop (but ended up using NAF anyway) and for randomizing/blinding in batch BLS signature verification and also implemented wNAF (but unused 🤷)
which then led to a refactor of the miller loop, and also accelerating pairings for the EVM with a MillerAccumulator.
Also the PR introduces the VarTime, Alloca and HeapAlloc effects so that the compiler can bubble up procs that uses those.
There are failed experiments (1 batch affine, some signed digits repr)
The PR introduced vartime field inversion for use in affine sum and MSM (and potentially pairings but it's like 0.5% of the cost)

Benches

BLST nim c -r -d:danger --passC:-march=native --hints:off --warnings:off --outdir:build benchmarks/bls12381_curve.nim, note it's important to bench BLST with completely random data as MSM is not constant-time and the doubling path would be much faster. (This is a measurement issue I add in Batch additions #207)
Gnark from 32 to 128 go test -bench=MultiExpG1 -cpu 1 -run=^#
Constantine from 8 to 128 nimble bench_ec_g1_msm_bls12_381

Here we're significantly faster than BLST and Gnark
Gnark from 256 to 8192
Constantine same range

Over 20% speedup over both
Gnark from 16384 to 262144 inputs
Constantine same range

Starting from 131071 points, Gnark takes the lead. The reason why is unknown but we start reaching L1 cache limits and also the 64K aliasing conflict boundary so some tuning on the number of buckets might help. See also https://www.youtube.com/watch?v=Bl5mQA7UL2I about having more than c=16 didn't help due to memory bandwidth limitations

cc @asn-d6 @yelhousni

…ding

mratsim · 2023-02-16T10:02:18Z

Some tuning to take back the perf crown on 2¹⁶ to 2¹⁸=262144 points range.

paulmillr · 2023-05-19T11:03:35Z

@mratsim thoughts on this? https://eprint.iacr.org/2022/1400

mratsim · 2023-05-19T12:30:36Z

I'm aware of this optimization, it was also mentioned in https://zprize.hardcaml.com/msm-point-representation.html.

There are 3 issues:

the asymptotic cost of twisted Edwards is 7M while affine addition is 6M. Affine addition is significantly harder to use but asymptotically 14% faster when you reach the threshold (~50 points to add).
Twisted Edwards representation is not universal, there is none for BLS12-381 for example, which is the curve I most interested in, iirc because there is no point of order 2.
There is a cost to conversion to edwards coordinates. This cost might be negligeable for small amount of points but noticeable with thousands or millions of points. For example I tested endomorphism acceleration in MSM but the extra preprocessing was not worth it when you got into 10000k+ points even though it's dividing the number of naive operations by 2.

So I didn't implemented it but I'm not hopeful.

Cc @gbotrel, @yelhousni

yelhousni · 2023-05-24T10:54:05Z

Cc @gbotrel, @yelhousni

Yup I agree with @mratsim comments — we ended up not merging this version into gnark-crypto because the affine version was faster in most use-cases we were interested in for gnark.
For bullet point 3, the conversion was mainly for the zprize competition sake (points were given in affine short Weierstrass) but for SNARK applications we can store them already in the right representation (twisted Edwards with a=-1 in the custom coordinates system). These curve points are part of the SNARK setup.

mratsim added 30 commits February 7, 2023 16:32

unoptimized msm

5e720cc

MSM: reorder loops

390f7e4

add a signed windowed recoding technique

87244b8

improve wNAF table access

eff4d60

use batchAffine

ad61b54

revamp EC tests

5d83f88

MSM signed digit support

229cd83

refactor MSM: recode signed ahead of time

2d72e75

missing test vector

9fe0797

refactor allocs and Alloca sideeffect

b860d77

add an endomorphism threshold

400603e

Add Jacobian extended coordinates

9b86495

refactor recodings, prepare for parallelizable on-the-fly signed reco…

10fc577

…ding

recoding changes, introduce proper NAF for pairings

6afca45

more pairings refactoring, introduce miller accumulator for EVM

61f276e

some optim to the addchain miller loop

8637890

start optimizing multi-pairing

92a19c9

finish multi-miller loop refactoring

53967cf

minor tuning

e2ac98d

MSM: signed encoding suitable for parallelism (no precompute)

56f5dda

cleanup signed window encoding

0ddb58f

add prefetching

7e160ae

add metering

8295a4d

properly init result to infinity

e6ef0de

comment on prefetching

d427dd5

introduce vartime inversion for batch additions

52b9fc6

fix JacExt infinity conversion

119fc05

add batchAffine for MSM, though slower than JacExtended at the moment

c7fbae0

add a batch affine scheduler for MSM

3d0bf5c

Add Multi-Scalar-Multiplication endomorphism acceleration

c6c67c6

some tuning

9a3726f

mratsim linked an issue Feb 15, 2023 that may be closed by this pull request

multi-scalar multiplication / multi-exponentiations (a.k.a. Pippenger algorithm) #37

Closed

mratsim added 2 commits February 16, 2023 10:43

signed integer fixes + 32-bit + tuning

1d70f64

Some more tuning

23722f4

common msm bench + don't use affine for c < 9

d415bfc

mratsim mentioned this pull request Feb 16, 2023

[Fuzz fail] WNAF (used only for randomization) #221

Closed

nit

a5edb54

mratsim merged commit e5612f5 into master Feb 16, 2023

mratsim deleted the msm branch February 16, 2023 11:45

mratsim mentioned this pull request Apr 10, 2023

Parallel Multi-Scalar-Multiplication #226

Merged

paulmillr mentioned this pull request May 19, 2023

Multi-scalar-multiplication for BLS paulmillr/noble-curves#48

Open

mratsim mentioned this pull request Jun 11, 2024

Towards state-of-the-art multi-scalar-muls privacy-scaling-explorations/halo2curves#163

Open

6 tasks

PatStiles mentioned this pull request Oct 2, 2023

feat(benchmarks): Benchmark Maintenance lambdaclass/lambdaworks#588

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Scalar-Multiplication / Linear combination #220

Multi-Scalar-Multiplication / Linear combination #220

mratsim commented Feb 15, 2023

mratsim commented Feb 16, 2023

paulmillr commented May 19, 2023

mratsim commented May 19, 2023 •

edited

Loading

yelhousni commented May 24, 2023

Multi-Scalar-Multiplication / Linear combination #220

Multi-Scalar-Multiplication / Linear combination #220

Conversation

mratsim commented Feb 15, 2023

Overview

Content

Benches

mratsim commented Feb 16, 2023

paulmillr commented May 19, 2023

mratsim commented May 19, 2023 • edited Loading

yelhousni commented May 24, 2023

mratsim commented May 19, 2023 •

edited

Loading