-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Scalar-Multiplication / Linear combination #220
Conversation
@mratsim thoughts on this? https://eprint.iacr.org/2022/1400 |
I'm aware of this optimization, it was also mentioned in https://zprize.hardcaml.com/msm-point-representation.html. There are 3 issues:
So I didn't implemented it but I'm not hopeful. Cc @gbotrel, @yelhousni |
Yup I agree with @mratsim comments — we ended up not merging this version into gnark-crypto because the affine version was faster in most use-cases we were interested in for gnark. |
Overview
This implements fast single-threaded (for now) multi-scalar-multiplication (MSM).
As mentioned in this EF presentation from 2020, the ecosystem has found techniques to remove FFTs altogether, in fact, Ethereum KZG polynomial commitment for EIP4844 used to require FFT in 2020 (#151) but not at all in 2023 (https://github.com/ethereum/consensus-specs/blob/59129e4/specs/deneb/polynomial-commitments.md) thanks to creating the polynomial in a specific way. And now 99% of time of any ZK system is spent in MSM (also called FLC for Fast Linear Combination in the slide)
The techniques used have been developed for easy porting to multi-cores and also GPU architectures.
Content
This PR goes a little all over the place.
Benches
BLST
nim c -r -d:danger --passC:-march=native --hints:off --warnings:off --outdir:build benchmarks/bls12381_curve.nim
, note it's important to bench BLST with completely random data as MSM is not constant-time and the doubling path would be much faster. (This is a measurement issue I add in Batch additions #207)Gnark from 32 to 128
go test -bench=MultiExpG1 -cpu 1 -run=^#
Constantine from 8 to 128
nimble bench_ec_g1_msm_bls12_381
Here we're significantly faster than BLST and Gnark
Gnark from 256 to 8192
Constantine same range
Over 20% speedup over both
Gnark from 16384 to 262144 inputs
Constantine same range
Starting from 131071 points, Gnark takes the lead. The reason why is unknown but we start reaching L1 cache limits and also the 64K aliasing conflict boundary so some tuning on the number of buckets might help. See also https://www.youtube.com/watch?v=Bl5mQA7UL2I about having more than c=16 didn't help due to memory bandwidth limitations
cc @asn-d6 @yelhousni