Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precompiles: Implement KZG proof verification (aka "point evaluation") #979

Merged
merged 3 commits into from
Sep 19, 2024

Conversation

chfast
Copy link
Member

@chfast chfast commented Aug 26, 2024

Implement the KZG proof verification using the blst library.
Add native point_evaluation implementation.
Add benchmarks and report the comparison with Silkworm's implementation (see PR).

@chfast chfast added the precompiles Related to EVM precompiles label Aug 26, 2024
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 98.86364% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.18%. Comparing base (edfe00d) to head (85e7ff7).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
test/precompiles_bench/precompiles_bench.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #979   +/-   ##
=======================================
  Coverage   94.18%   94.18%           
=======================================
  Files         147      149    +2     
  Lines       15843    15884   +41     
=======================================
+ Hits        14922    14961   +39     
- Misses        921      923    +2     
Flag Coverage Δ
eof_execution_spec_tests 17.22% <0.00%> (-0.05%) ⬇️
ethereum_tests 27.31% <71.59%> (+0.03%) ⬆️
ethereum_tests_silkpre 19.10% <68.75%> (-0.06%) ⬇️
execution_spec_tests 20.40% <71.59%> (+0.06%) ⬆️
unittests 88.96% <76.13%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/evmone_precompiles/kzg.cpp 100.00% <100.00%> (ø)
test/state/precompiles.cpp 98.92% <100.00%> (+0.05%) ⬆️
test/state/precompiles_stubs.cpp 99.76% <ø> (-0.24%) ⬇️
test/unittests/precompiles_kzg_test.cpp 100.00% <100.00%> (ø)
test/precompiles_bench/precompiles_bench.cpp 0.00% <0.00%> (ø)

@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from 8fb5592 to d8d3821 Compare August 29, 2024 18:46
@chfast chfast marked this pull request as ready for review August 29, 2024 18:48
@chfast chfast requested review from rodiazet, gumb0 and pdobacz August 29, 2024 18:48
@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from d8d3821 to 6ae4882 Compare August 29, 2024 18:58
@chfast
Copy link
Member Author

chfast commented Aug 29, 2024

Benchmarks of all precompiles using clang-18 compiler on Intel Ultra 7 155U 4.0 GHz:

identity, identity_execute            31.7 ns    gas_rate= 12.6839G/s
ecrecover, evmmax_cpp               388966 ns    gas_rate=  7.7133M/s
ecrecover, libsecp256k1              30185 ns    gas_rate= 99.3917M/s
ecadd, evmmax_cpp                     9109 ns    gas_rate= 16.4693M/s
ecadd, libff                          2476 ns    gas_rate= 60.5889M/s
ecmul, evmmax_cpp                   178931 ns    gas_rate= 33.5353M/s
ecmul, libff                        175203 ns    gas_rate= 34.2483M/s
ecpairing, libff                   6467982 ns    gas_rate= 23.7796M/s
point_evaluation, evmone_blst      1064709 ns    gas_rate= 46.9629M/s
point_evaluation, silkworm         1065344 ns    gas_rate= 46.9348M/s

@chfast
Copy link
Member Author

chfast commented Aug 30, 2024

The benchmarks of Constantine for reference (same hardware and compiler):

--------------------------------------------------------------------------------------------------------------------------------
SHA256 -  32 bytes            72 gas    1500.00 MGas/s    20833333.333 ops/s           48 ns/op          130 CPU cycles (approx)
SHA256 -  64 bytes            84 gas     857.14 MGas/s    10204081.633 ops/s           98 ns/op          263 CPU cycles (approx)
SHA256 -  96 bytes            96 gas    1000.00 MGas/s    10416666.667 ops/s           96 ns/op          258 CPU cycles (approx)
SHA256 - 128 bytes           108 gas     843.75 MGas/s     7812500.000 ops/s          128 ns/op          345 CPU cycles (approx)
SHA256 - 160 bytes           120 gas     923.08 MGas/s     7692307.692 ops/s          130 ns/op          349 CPU cycles (approx)
SHA256 - 192 bytes           132 gas     851.61 MGas/s     6451612.903 ops/s          155 ns/op          416 CPU cycles (approx)
SHA256 - 224 bytes           144 gas     911.39 MGas/s     6329113.924 ops/s          158 ns/op          424 CPU cycles (approx)
SHA256 - 256 bytes           156 gas     857.14 MGas/s     5494505.495 ops/s          182 ns/op          489 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BN254_G1ADD                  150 gas      66.84 MGas/s      445632.799 ops/s         2244 ns/op         6030 CPU cycles (approx)
BN254_G1MUL                 6000 gas     244.09 MGas/s       40681.827 ops/s        24581 ns/op        66063 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BN254_PAIRINGCHECK 1       79000 gas     199.44 MGas/s        2524.609 ops/s       396101 ns/op      1064457 CPU cycles (approx)
BN254_PAIRINGCHECK 2      113000 gas     201.98 MGas/s        1787.406 ops/s       559470 ns/op      1503580 CPU cycles (approx)
BN254_PAIRINGCHECK 3      147000 gas     198.84 MGas/s        1352.671 ops/s       739278 ns/op      1986814 CPU cycles (approx)
BN254_PAIRINGCHECK 4      181000 gas     201.39 MGas/s        1112.673 ops/s       898737 ns/op      2415388 CPU cycles (approx)
BN254_PAIRINGCHECK 5      215000 gas     200.34 MGas/s         931.811 ops/s      1073179 ns/op      2884209 CPU cycles (approx)
BN254_PAIRINGCHECK 6      249000 gas     199.80 MGas/s         802.397 ops/s      1246266 ns/op      3349388 CPU cycles (approx)
BN254_PAIRINGCHECK 7      283000 gas     201.08 MGas/s         710.514 ops/s      1407431 ns/op      3782520 CPU cycles (approx)
BN254_PAIRINGCHECK 8      317000 gas     201.23 MGas/s         634.787 ops/s      1575332 ns/op      4233774 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BLS12_G1ADD                  500 gas     140.45 MGas/s      280898.876 ops/s         3560 ns/op         9569 CPU cycles (approx)
BLS12_G2ADD                  800 gas     172.34 MGas/s      215424.386 ops/s         4642 ns/op        12475 CPU cycles (approx)
BLS12_G1MUL                12000 gas     136.98 MGas/s       11414.743 ops/s        87606 ns/op       235447 CPU cycles (approx)
BLS12_G2MUL                45000 gas     313.09 MGas/s        6957.538 ops/s       143729 ns/op       386280 CPU cycles (approx)
BLS12_MAP_FP_TO_G1          5500 gas     151.72 MGas/s       27584.685 ops/s        36252 ns/op        97429 CPU cycles (approx)
BLS12_MAP_FP2_TO_G2        75000 gas     630.78 MGas/s        8410.358 ops/s       118901 ns/op       319553 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BLS12_PAIRINGCHECK 1      108000 gas     198.45 MGas/s        1837.516 ops/s       544213 ns/op      1462572 CPU cycles (approx)
BLS12_PAIRINGCHECK 2      151000 gas     204.53 MGas/s        1354.487 ops/s       738287 ns/op      1984175 CPU cycles (approx)
BLS12_PAIRINGCHECK 3      194000 gas     205.60 MGas/s        1059.792 ops/s       943581 ns/op      2535914 CPU cycles (approx)
BLS12_PAIRINGCHECK 4      237000 gas     206.89 MGas/s         872.958 ops/s      1145531 ns/op      3078667 CPU cycles (approx)
BLS12_PAIRINGCHECK 5      280000 gas     207.38 MGas/s         740.650 ops/s      1350166 ns/op      3628633 CPU cycles (approx)
BLS12_PAIRINGCHECK 6      323000 gas     209.14 MGas/s         647.480 ops/s      1544450 ns/op      4150785 CPU cycles (approx)
BLS12_PAIRINGCHECK 7      366000 gas     200.72 MGas/s         548.411 ops/s      1823451 ns/op      4900604 CPU cycles (approx)
BLS12_PAIRINGCHECK 8      409000 gas     209.44 MGas/s         512.090 ops/s      1952780 ns/op      5248164 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BLS12_G1MSM   2            21312 gas     108.45 MGas/s        5088.618 ops/s       196517 ns/op       528127 CPU cycles (approx)
BLS12_G1MSM   4            30768 gas      97.03 MGas/s        3153.579 ops/s       317100 ns/op       852211 CPU cycles (approx)
BLS12_G1MSM   8            43488 gas      78.10 MGas/s        1795.816 ops/s       556850 ns/op      1496551 CPU cycles (approx)
BLS12_G1MSM  16            64128 gas      64.99 MGas/s        1013.393 ops/s       986784 ns/op      2651992 CPU cycles (approx)
BLS12_G1MSM  32           103296 gas      56.62 MGas/s         548.138 ops/s      1824357 ns/op      4903023 CPU cycles (approx)
BLS12_G1MSM  64           170496 gas      49.55 MGas/s         290.649 ops/s      3440580 ns/op      9246719 CPU cycles (approx)
BLS12_G1MSM 128           267264 gas      40.67 MGas/s         152.155 ops/s      6572254 ns/op     17663246 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------
BLS12_G2MSM   2            79920 gas     244.32 MGas/s        3057.076 ops/s       327110 ns/op       879110 CPU cycles (approx)
BLS12_G2MSM   4           115380 gas     216.28 MGas/s        1874.523 ops/s       533469 ns/op      1433712 CPU cycles (approx)
BLS12_G2MSM   8           163080 gas     170.77 MGas/s        1047.149 ops/s       954974 ns/op      2566529 CPU cycles (approx)
BLS12_G2MSM  16           240480 gas     144.01 MGas/s         598.836 ops/s      1669905 ns/op      4487924 CPU cycles (approx)
BLS12_G2MSM  32           387360 gas     131.16 MGas/s         338.607 ops/s      2953275 ns/op      7937066 CPU cycles (approx)
BLS12_G2MSM  64           639360 gas     111.79 MGas/s         174.846 ops/s      5719317 ns/op     15370947 CPU cycles (approx)
BLS12_G2MSM 128          1002240 gas      95.42 MGas/s          95.208 ops/s     10503296 ns/op     28228149 CPU cycles (approx)
--------------------------------------------------------------------------------------------------------------------------------

Comment on lines +39 to +53
constexpr blst_p2_affine KZG_SETUP_G2_1{
{{{0x6120a2099b0379f9, 0xa2df815cb8210e4e, 0xcb57be5577bd3d4f, 0x62da0ea89a0c93f8,
0x02e0ee16968e150d, 0x171f09aea833acd5},
{0x11a3670749dfd455, 0x04991d7b3abffadc, 0x85446a8e14437f41, 0x27174e7b4e76e3f2,
0x7bfa6dd397f60a20, 0x02fcc329ac07080f}}},
{{{0xaa130838793b2317, 0xe236dd220f891637, 0x6502782925760980, 0xd05c25f60557ec89,
0x6095767a44064474, 0x185693917080d405},
{0x549f9e175b03dc0a, 0x32c0c95a77106cfe, 0x64a74eae5705d080, 0x53deeaf56659ed9e,
0x09a1d368508afb93, 0x12cf3a4525b5e9bd}}}};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these checked against the json somewhere?

There are actually multiple trusted setups, so if we change it in the future to a different one, it would be good to have some check that fails somewhere.

Not a blocker imo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is from the trusted setup of size 4096, where size denotes how many G1 elements there are. There are always 65 G2 elements IIRC in all of the trusted setups that we created

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code pointed to https://github.com/ethereum/consensus-specs/blob/dev/presets/mainnet/trusted_setups/trusted_setup_4096.json, but it is not very useful, because I don't know what this JSON has (e.g. it has 8192 entries for G1?). With some effort we could take the pointed value and pre-process it to the blst internal form at compile time. But we are not ready for it yet.

Comment on lines +50 to +64
std::optional<blst_scalar> validate_scalar(std::span<const std::byte, 32> b) noexcept
{
blst_scalar v;
blst_scalar_from_bendian(&v, reinterpret_cast<const uint8_t*>(b.data()));
return blst_scalar_fr_check(&v) ? std::optional{v} : std::nullopt;
}

/// Uncompress and validate a point from G1 subgroup.
std::optional<blst_p1_affine> validate_G1(std::span<const std::byte, 48> b) noexcept

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the rest of the code, so this may not be easy; the 32 and 48 are well defined constants in the spec -- is it not possible to use them instead of using magic numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, make sense. I plan to work on this but I need a separate PR for this to explore some directions (e.g. use std::span).

blst_p1_affine r;
if (blst_p1_uncompress(&r, reinterpret_cast<const uint8_t*>(b.data())) != BLST_SUCCESS)
return std::nullopt;
if (!blst_p1_affine_in_g1(&r)) // Subgroup check is required by the spec but not testable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: check if this is actually doing a subgroup check or just a group check

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is blst_p1_affine_on_curve which would be a group check, so in g1 is the subgroup check

blst_p1 r;
blst_p1_add_or_double_affine(&r, &q, &p);
blst_p1_affine ra;
blst_p1_to_affine(&ra, &r);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may hurt performance, if you plan to do multiple adds or doubles in a row

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note added.

blst_p1 mult(const blst_p1& p, const blst_scalar& v) noexcept
{
blst_p1 r;
blst_p1_mult(&r, &p, v.b, 255);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

255 is the safe choice here, though I think there might be a method in blst that tells you how many bits a scalar is and that could be used to optimize this a bit -- unclear to me what the diff would be though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the BLST C++ API is using. I used a named constant now.

Comment on lines 303 to 407

ExecutionResult point_evaluation_execute(const uint8_t* input, size_t input_size, uint8_t* output,
[[maybe_unused]] size_t output_size) noexcept
{
assert(output_size >= 64);
if (input_size != 192)
return {EVMC_PRECOMPILE_FAILURE, 0};

const auto r = crypto::kzg_verify_proof(reinterpret_cast<const std::byte*>(&input[0]),
reinterpret_cast<const std::byte*>(&input[32]),
reinterpret_cast<const std::byte*>(&input[64]),
reinterpret_cast<const std::byte*>(&input[96]),
reinterpret_cast<const std::byte*>(&input[96 + 48]));

if (!r)
return {EVMC_PRECOMPILE_FAILURE, 0};

intx::be::unsafe::store(output, FIELD_ELEMENTS_PER_BLOB);
intx::be::unsafe::store(output + 32, BLS_MODULUS);
return {EVMC_SUCCESS, 64};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, could we link to the eip method or explain what the offsets are being used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
std::byte computed_versioned_hash[32];
sha256(computed_versioned_hash, commitment, 48);
computed_versioned_hash[0] = std::byte{0x01};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: 0x01 is VERSIONED_HASH_VERSION_KZG

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant name in use now.

const auto neg_Y = mult(G1_GENERATOR_NEGATIVE, *yy);

// Compute C - Y. It can happen that C == -Y so doubling may be needed.
const auto C_sub_Y = add_or_double(*C, neg_Y);
Copy link

@kevaundray kevaundray Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably always use add_or_double everywhere as I'm guessing the performance impact is negligible and you won't need to justify using one over the other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ExecutionResult point_evaluation_execute(const uint8_t* input, size_t input_size, uint8_t* output,
[[maybe_unused]] size_t output_size) noexcept
{
assert(output_size >= 64);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not == instead of >= ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the output could have came from something that allocated more than 64 bytes, the caller can just take the first 64 bytes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are internals of the precompiles framework. It is expected the caller to allocate at least 64 for the output.

@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch 3 times, most recently from c863dbc to 97dfa6f Compare September 3, 2024 11:36
@chfast
Copy link
Member Author

chfast commented Sep 3, 2024

Big thanks @kevaundray for the review. I believe I addressed most of the comments.

@kevaundray
Copy link

Big thanks @kevaundray for the review. I believe I addressed most of the comments.

Yep -- I have a note to look at pairings_verify just because I'm not familiar with the blst API it uses.

In particular, it calls void blst_aggregated_in_g1(blst_fp12 *out, const blst_p1_affine *signature); which seems to be for a different API (bls signature) -- Its probably fine, but since there are no docs in blst and I have not checked that method properly, I thought I'd flag it anyways

@chfast
Copy link
Member Author

chfast commented Sep 3, 2024

Yep -- I have a note to look at pairings_verify just because I'm not familiar with the blst API it uses.

In particular, it calls void blst_aggregated_in_g1(blst_fp12 *out, const blst_p1_affine *signature); which seems to be for a different API (bls signature) -- Its probably fine, but since there are no docs in blst and I have not checked that method properly, I thought I'd flag it anyways

This is also my concern, although @mratsim said this is fine. I'll try to investigate too.

@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from 97dfa6f to 5699512 Compare September 4, 2024 11:56
@chfast
Copy link
Member Author

chfast commented Sep 9, 2024

In particular, it calls void blst_aggregated_in_g1(blst_fp12 *out, const blst_p1_affine *signature); which seems to be for a different API (bls signature) -- Its probably fine, but since there are no docs in blst and I have not checked that method properly, I thought I'd flag it anyways

The blst_aggregated_in_g1 is a shortcut for miller loop with the G2 generator.

void blst_aggregated_in_g1(vec384fp12 ret, const POINTonE1_affine *sig)
{   miller_loop_n(ret, (const POINTonE2_affine *)&BLS12_381_G2, sig, 1);   }

@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from 5699512 to 4155aaa Compare September 9, 2024 15:01
const blst_p1_affine& a1, const blst_p1_affine& b1, const blst_p2_affine& b2) noexcept
{
blst_fp12 left;
blst_aggregated_in_g1(&left, &a1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in practice just does miller_loop with G2 generator, do it may be good not to confuse readers with using this API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this would be better -- following the method and seeing signature was a bit confusing at first

blst_aggregated_in_g1(&left, &a1);
blst_fp12 right;
blst_miller_loop(&right, &b2, &b1);
return blst_fp12_finalverify(&left, &right);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be doing the same as "classic" pairings_verify from c-kzg.

How about instead of negating the a1 at runtime we pre-compute negative G2 generator?
Can we also use blst_miller_loop_n?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can compute the negative G2 generator at compile time -- I think the negation would cost on the order of 10 nanoseconds which is likely why its done at runtime in other codebases (pairings is on the order of a millisecond)

@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from 4155aaa to 0aa75e4 Compare September 18, 2024 14:26
@rodiazet
Copy link
Collaborator

Can we add a source where KZG_SETUP_G2_1 value comes from?

Implement the KZG proof verification following the spec of
`verify_kfz_proof` in
https://eips.ethereum.org/EIPS/eip-4844#point-evaluation-precompile.
Use `crypto::kzg_verify_proof` to implement and enable
the `point_evaluation` precompile.
@chfast chfast force-pushed the precompiles/kzg_point_evaluation branch from 0aa75e4 to 85e7ff7 Compare September 19, 2024 15:04
@chfast chfast enabled auto-merge September 19, 2024 15:07
@chfast chfast merged commit 4a3cb41 into master Sep 19, 2024
22 of 23 checks passed
@chfast chfast deleted the precompiles/kzg_point_evaluation branch September 19, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
precompiles Related to EVM precompiles
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants