Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xxhash with salt for fastMsgIdFn #4630

Closed
wants to merge 1 commit into from

Conversation

dapplion
Copy link
Contributor

@dapplion dapplion commented Oct 3, 2022

Motivation

fastMsgIdFn() must produce a unique output per message within the message of the last few heartbeats. We can use any hash function so we should go for the cheapest possible with good safety.

sha256 on a node subscribed to all subnets it can take 0.5% of total CPU time.

xxhash is a cryptographically unsafe function that's has really good performance and good distribution of output. This PR also adds salt so an attacker can't pre-compute collisions.

Description

  • Use xxhash with salt for fastMsgIdFn

Closes #4603

@dapplion dapplion requested a review from a team as a code owner October 3, 2022 19:36
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: d341a38 Previous: 07eb658 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 2.0318 ms/op 1.7634 ms/op 1.15
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 65.798 us/op 64.090 us/op 1.03
BLS verify - blst-native 2.1641 ms/op 2.1641 ms/op 1.00
BLS verifyMultipleSignatures 3 - blst-native 4.4705 ms/op 4.4728 ms/op 1.00
BLS verifyMultipleSignatures 8 - blst-native 9.6622 ms/op 9.6648 ms/op 1.00
BLS verifyMultipleSignatures 32 - blst-native 35.128 ms/op 35.127 ms/op 1.00
BLS aggregatePubkeys 32 - blst-native 46.725 us/op 46.784 us/op 1.00
BLS aggregatePubkeys 128 - blst-native 181.92 us/op 182.35 us/op 1.00
getAttestationsForBlock 78.326 ms/op 73.302 ms/op 1.07
isKnown best case - 1 super set check 467.00 ns/op 469.00 ns/op 1.00
isKnown normal case - 2 super set checks 455.00 ns/op 460.00 ns/op 0.99
isKnown worse case - 16 super set checks 452.00 ns/op 462.00 ns/op 0.98
CheckpointStateCache - add get delete 8.7860 us/op 8.6060 us/op 1.02
validate gossip signedAggregateAndProof - struct 5.0083 ms/op 5.0126 ms/op 1.00
validate gossip attestation - struct 2.3681 ms/op 2.3756 ms/op 1.00
pickEth1Vote - no votes 2.1718 ms/op 2.1011 ms/op 1.03
pickEth1Vote - max votes 19.185 ms/op 16.962 ms/op 1.13
pickEth1Vote - Eth1Data hashTreeRoot value x2048 13.148 ms/op 12.026 ms/op 1.09
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 21.048 ms/op 19.676 ms/op 1.07
pickEth1Vote - Eth1Data fastSerialize value x2048 1.4430 ms/op 1.4500 ms/op 1.00
pickEth1Vote - Eth1Data fastSerialize tree x2048 13.271 ms/op 12.777 ms/op 1.04
bytes32 toHexString 991.00 ns/op 1.0390 us/op 0.95
bytes32 Buffer.toString(hex) 830.00 ns/op 777.00 ns/op 1.07
bytes32 Buffer.toString(hex) from Uint8Array 1.0700 us/op 1.0180 us/op 1.05
bytes32 Buffer.toString(hex) + 0x 836.00 ns/op 783.00 ns/op 1.07
Object access 1 prop 0.36700 ns/op 0.37200 ns/op 0.99
Map access 1 prop 0.31800 ns/op 0.31000 ns/op 1.03
Object get x1000 10.626 ns/op 10.843 ns/op 0.98
Map get x1000 1.0000 ns/op 0.93500 ns/op 1.07
Object set x1000 72.367 ns/op 74.087 ns/op 0.98
Map set x1000 48.701 ns/op 50.387 ns/op 0.97
Return object 10000 times 0.43970 ns/op 0.43920 ns/op 1.00
Throw Error 10000 times 6.0212 us/op 5.9921 us/op 1.00
fastMsgIdFn sha256 / 200 bytes 4.9610 us/op
fastMsgIdFn xxhash / 200 bytes 567.00 ns/op
fastMsgIdFn xxhash+String / 200 bytes 553.00 ns/op
fastMsgIdFn xxhash+concat / 200 bytes 845.00 ns/op
fastMsgIdFn sha256 / 1000 bytes 15.618 us/op
fastMsgIdFn xxhash / 1000 bytes 704.00 ns/op
fastMsgIdFn xxhash+String / 1000 bytes 733.00 ns/op
fastMsgIdFn xxhash+concat / 1000 bytes 1.5140 us/op
fastMsgIdFn sha256 / 10000 bytes 133.82 us/op
fastMsgIdFn xxhash / 10000 bytes 2.5530 us/op
fastMsgIdFn xxhash+String / 10000 bytes 2.5100 us/op
fastMsgIdFn xxhash+concat / 10000 bytes 6.6470 us/op
enrSubnets - fastDeserialize 64 bits 2.2840 us/op 2.5490 us/op 0.90
enrSubnets - ssz BitVector 64 bits 777.00 ns/op 820.00 ns/op 0.95
enrSubnets - fastDeserialize 4 bits 348.00 ns/op 383.00 ns/op 0.91
enrSubnets - ssz BitVector 4 bits 798.00 ns/op 802.00 ns/op 1.00
prioritizePeers score -10:0 att 32-0.1 sync 2-0 78.680 us/op 82.853 us/op 0.95
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 123.69 us/op 124.71 us/op 0.99
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 198.16 us/op 209.24 us/op 0.95
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 341.31 us/op 332.89 us/op 1.03
prioritizePeers score 0:0 att 64-1 sync 4-1 413.02 us/op 406.72 us/op 1.02
RateTracker 1000000 limit, 1 obj count per request 176.18 ns/op 181.00 ns/op 0.97
RateTracker 1000000 limit, 2 obj count per request 126.96 ns/op 131.84 ns/op 0.96
RateTracker 1000000 limit, 4 obj count per request 103.39 ns/op 109.11 ns/op 0.95
RateTracker 1000000 limit, 8 obj count per request 91.167 ns/op 96.671 ns/op 0.94
RateTracker with prune 3.6200 us/op 4.0320 us/op 0.90
array of 16000 items push then shift 51.577 us/op 51.581 us/op 1.00
LinkedList of 16000 items push then shift 12.275 ns/op 12.498 ns/op 0.98
array of 16000 items push then pop 194.43 ns/op 219.09 ns/op 0.89
LinkedList of 16000 items push then pop 12.074 ns/op 11.949 ns/op 1.01
array of 24000 items push then shift 77.363 us/op 77.381 us/op 1.00
LinkedList of 24000 items push then shift 12.527 ns/op 12.926 ns/op 0.97
array of 24000 items push then pop 197.90 ns/op 197.78 ns/op 1.00
LinkedList of 24000 items push then pop 12.251 ns/op 12.152 ns/op 1.01
intersect bitArray bitLen 8 10.583 ns/op 10.812 ns/op 0.98
intersect array and set length 8 125.00 ns/op 147.25 ns/op 0.85
intersect bitArray bitLen 128 55.547 ns/op 55.618 ns/op 1.00
intersect array and set length 128 1.7072 us/op 1.9621 us/op 0.87
Buffer.concat 32 items 1.7560 ns/op 1.7850 ns/op 0.98
pass gossip attestations to forkchoice per slot 3.6675 ms/op 3.6205 ms/op 1.01
computeDeltas 4.5182 ms/op 4.9079 ms/op 0.92
computeProposerBoostScoreFromBalances 803.70 us/op 806.62 us/op 1.00
altair processAttestation - 250000 vs - 7PWei normalcase 3.3567 ms/op 3.6434 ms/op 0.92
altair processAttestation - 250000 vs - 7PWei worstcase 5.0953 ms/op 5.5216 ms/op 0.92
altair processAttestation - setStatus - 1/6 committees join 180.38 us/op 196.19 us/op 0.92
altair processAttestation - setStatus - 1/3 committees join 352.03 us/op 363.55 us/op 0.97
altair processAttestation - setStatus - 1/2 committees join 505.13 us/op 523.26 us/op 0.97
altair processAttestation - setStatus - 2/3 committees join 661.88 us/op 679.29 us/op 0.97
altair processAttestation - setStatus - 4/5 committees join 928.48 us/op 942.80 us/op 0.98
altair processAttestation - setStatus - 100% committees join 1.1299 ms/op 1.1478 ms/op 0.98
altair processBlock - 250000 vs - 7PWei normalcase 25.882 ms/op 26.100 ms/op 0.99
altair processBlock - 250000 vs - 7PWei normalcase hashState 35.847 ms/op 40.654 ms/op 0.88
altair processBlock - 250000 vs - 7PWei worstcase 81.449 ms/op 77.288 ms/op 1.05
altair processBlock - 250000 vs - 7PWei worstcase hashState 103.01 ms/op 103.45 ms/op 1.00
phase0 processBlock - 250000 vs - 7PWei normalcase 3.1988 ms/op 3.3250 ms/op 0.96
phase0 processBlock - 250000 vs - 7PWei worstcase 51.761 ms/op 50.435 ms/op 1.03
altair processEth1Data - 250000 vs - 7PWei normalcase 738.17 us/op 672.80 us/op 1.10
Tree 40 250000 create 715.42 ms/op 668.08 ms/op 1.07
Tree 40 250000 get(125000) 245.58 ns/op 253.31 ns/op 0.97
Tree 40 250000 set(125000) 2.0814 us/op 2.3162 us/op 0.90
Tree 40 250000 toArray() 27.194 ms/op 27.543 ms/op 0.99
Tree 40 250000 iterate all - toArray() + loop 26.920 ms/op 27.350 ms/op 0.98
Tree 40 250000 iterate all - get(i) 113.77 ms/op 113.66 ms/op 1.00
MutableVector 250000 create 13.113 ms/op 13.965 ms/op 0.94
MutableVector 250000 get(125000) 11.301 ns/op 11.970 ns/op 0.94
MutableVector 250000 set(125000) 531.11 ns/op 444.90 ns/op 1.19
MutableVector 250000 toArray() 6.0751 ms/op 5.9190 ms/op 1.03
MutableVector 250000 iterate all - toArray() + loop 6.4310 ms/op 5.9859 ms/op 1.07
MutableVector 250000 iterate all - get(i) 2.7171 ms/op 2.6220 ms/op 1.04
Array 250000 create 5.9997 ms/op 5.3496 ms/op 1.12
Array 250000 clone - spread 3.2784 ms/op 2.4267 ms/op 1.35
Array 250000 get(125000) 1.4800 ns/op 1.1720 ns/op 1.26
Array 250000 set(125000) 1.4700 ns/op 1.1650 ns/op 1.26
Array 250000 iterate all - loop 153.68 us/op 154.00 us/op 1.00
effectiveBalanceIncrements clone Uint8Array 300000 45.482 us/op 37.550 us/op 1.21
effectiveBalanceIncrements clone MutableVector 300000 1.0210 us/op 741.00 ns/op 1.38
effectiveBalanceIncrements rw all Uint8Array 300000 247.06 us/op 248.52 us/op 0.99
effectiveBalanceIncrements rw all MutableVector 300000 165.74 ms/op 142.45 ms/op 1.16
phase0 afterProcessEpoch - 250000 vs - 7PWei 190.14 ms/op 186.15 ms/op 1.02
phase0 beforeProcessEpoch - 250000 vs - 7PWei 79.049 ms/op 60.598 ms/op 1.30
altair processEpoch - mainnet_e81889 571.74 ms/op 573.47 ms/op 1.00
mainnet_e81889 - altair beforeProcessEpoch 81.406 ms/op 107.88 ms/op 0.75
mainnet_e81889 - altair processJustificationAndFinalization 18.322 us/op 16.936 us/op 1.08
mainnet_e81889 - altair processInactivityUpdates 8.7869 ms/op 8.4487 ms/op 1.04
mainnet_e81889 - altair processRewardsAndPenalties 138.58 ms/op 76.585 ms/op 1.81
mainnet_e81889 - altair processRegistryUpdates 2.5390 us/op 2.4960 us/op 1.02
mainnet_e81889 - altair processSlashings 531.00 ns/op 513.00 ns/op 1.04
mainnet_e81889 - altair processEth1DataReset 632.00 ns/op 613.00 ns/op 1.03
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.3604 ms/op 1.9309 ms/op 1.22
mainnet_e81889 - altair processSlashingsReset 4.7350 us/op 4.8170 us/op 0.98
mainnet_e81889 - altair processRandaoMixesReset 4.0060 us/op 3.8420 us/op 1.04
mainnet_e81889 - altair processHistoricalRootsUpdate 588.00 ns/op 607.00 ns/op 0.97
mainnet_e81889 - altair processParticipationFlagUpdates 2.0050 us/op 2.0420 us/op 0.98
mainnet_e81889 - altair processSyncCommitteeUpdates 1.2110 us/op 610.00 ns/op 1.99
mainnet_e81889 - altair afterProcessEpoch 218.64 ms/op 196.70 ms/op 1.11
phase0 processEpoch - mainnet_e58758 620.29 ms/op 480.53 ms/op 1.29
mainnet_e58758 - phase0 beforeProcessEpoch 224.02 ms/op 175.63 ms/op 1.28
mainnet_e58758 - phase0 processJustificationAndFinalization 17.163 us/op 16.940 us/op 1.01
mainnet_e58758 - phase0 processRewardsAndPenalties 122.05 ms/op 96.498 ms/op 1.26
mainnet_e58758 - phase0 processRegistryUpdates 8.1490 us/op 8.1530 us/op 1.00
mainnet_e58758 - phase0 processSlashings 599.00 ns/op 549.00 ns/op 1.09
mainnet_e58758 - phase0 processEth1DataReset 636.00 ns/op 558.00 ns/op 1.14
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.7558 ms/op 1.7314 ms/op 1.01
mainnet_e58758 - phase0 processSlashingsReset 3.6660 us/op 3.6980 us/op 0.99
mainnet_e58758 - phase0 processRandaoMixesReset 3.8940 us/op 4.2550 us/op 0.92
mainnet_e58758 - phase0 processHistoricalRootsUpdate 727.00 ns/op 691.00 ns/op 1.05
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.4160 us/op 3.3680 us/op 1.01
mainnet_e58758 - phase0 afterProcessEpoch 162.78 ms/op 161.83 ms/op 1.01
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.0643 ms/op 1.9459 ms/op 1.06
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.3421 ms/op 2.2227 ms/op 1.05
altair processInactivityUpdates - 250000 normalcase 53.666 ms/op 40.220 ms/op 1.33
altair processInactivityUpdates - 250000 worstcase 53.153 ms/op 32.717 ms/op 1.62
phase0 processRegistryUpdates - 250000 normalcase 6.7800 us/op 6.0430 us/op 1.12
phase0 processRegistryUpdates - 250000 badcase_full_deposits 374.86 us/op 382.11 us/op 0.98
phase0 processRegistryUpdates - 250000 worstcase 0.5 224.73 ms/op 190.41 ms/op 1.18
altair processRewardsAndPenalties - 250000 normalcase 135.68 ms/op 72.778 ms/op 1.86
altair processRewardsAndPenalties - 250000 worstcase 136.73 ms/op 106.14 ms/op 1.29
phase0 getAttestationDeltas - 250000 normalcase 11.344 ms/op 11.351 ms/op 1.00
phase0 getAttestationDeltas - 250000 worstcase 11.411 ms/op 11.919 ms/op 0.96
phase0 processSlashings - 250000 worstcase 4.9906 ms/op 4.9920 ms/op 1.00
altair processSyncCommitteeUpdates - 250000 283.94 ms/op 287.40 ms/op 0.99
BeaconState.hashTreeRoot - No change 566.00 ns/op 542.00 ns/op 1.04
BeaconState.hashTreeRoot - 1 full validator 70.115 us/op 65.876 us/op 1.06
BeaconState.hashTreeRoot - 32 full validator 734.26 us/op 729.80 us/op 1.01
BeaconState.hashTreeRoot - 512 full validator 8.7419 ms/op 6.9788 ms/op 1.25
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 88.835 us/op 88.774 us/op 1.00
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.4237 ms/op 1.2759 ms/op 1.12
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 16.655 ms/op 17.690 ms/op 0.94
BeaconState.hashTreeRoot - 1 balances 78.445 us/op 71.950 us/op 1.09
BeaconState.hashTreeRoot - 32 balances 632.92 us/op 645.71 us/op 0.98
BeaconState.hashTreeRoot - 512 balances 6.7355 ms/op 6.4848 ms/op 1.04
BeaconState.hashTreeRoot - 250000 balances 116.04 ms/op 105.70 ms/op 1.10
aggregationBits - 2048 els - zipIndexesInBitList 23.630 us/op 25.712 us/op 0.92
regular array get 100000 times 60.531 us/op 61.203 us/op 0.99
wrappedArray get 100000 times 60.546 us/op 60.549 us/op 1.00
arrayWithProxy get 100000 times 28.354 ms/op 28.793 ms/op 0.98
ssz.Root.equals 444.00 ns/op 480.00 ns/op 0.93
byteArrayEquals 436.00 ns/op 471.00 ns/op 0.93
shuffle list - 16384 els 11.357 ms/op 11.449 ms/op 0.99
shuffle list - 250000 els 167.08 ms/op 166.22 ms/op 1.01
processSlot - 1 slots 12.551 us/op 13.517 us/op 0.93
processSlot - 32 slots 1.9483 ms/op 1.9816 ms/op 0.98
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 387.88 us/op 395.03 us/op 0.98
getCommitteeAssignments - req 1 vs - 250000 vc 5.4813 ms/op 5.5712 ms/op 0.98
getCommitteeAssignments - req 100 vs - 250000 vc 7.9744 ms/op 8.1910 ms/op 0.97
getCommitteeAssignments - req 1000 vs - 250000 vc 8.5730 ms/op 8.6811 ms/op 0.99
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 8.3800 ns/op 8.8700 ns/op 0.94
state getBlockRootAtSlot - 250000 vs - 7PWei 1.1101 us/op 1.0911 us/op 1.02
computeProposers - vc 250000 17.414 ms/op 17.210 ms/op 1.01
computeEpochShuffling - vc 250000 169.86 ms/op 170.28 ms/op 1.00
getNextSyncCommittee - vc 250000 288.78 ms/op 287.20 ms/op 1.01

by benchmarkbot/action

@twoeths
Copy link
Contributor

twoeths commented Oct 4, 2022

the benchmark looks great, suggest to test this on a node for 1 day before merge

@twoeths
Copy link
Contributor

twoeths commented Oct 4, 2022

@dapplion I deployed this branch to feat1 lg1k

@twoeths
Copy link
Contributor

twoeths commented Oct 4, 2022

this looks promising, initially it takes 200ms - 300ms submitAttestationPool request time while unstable (feat2) returns 1.5s-2s (both deployed at the same time, attestations are submitted right at 1/3 of slot)

Screen Shot 2022-10-04 at 17 07 20

this correlates to the memory issue with this branch

Screen Shot 2022-10-04 at 17 07 56

to avoid the String creation, we should try ChainSafe/js-libp2p-gossipsub#355

@dapplion dapplion marked this pull request as draft October 4, 2022 16:18
@dapplion
Copy link
Contributor Author

dapplion commented Oct 4, 2022

This is a low-priority optimization, not worth risking a memory leak for this. @tuyennhv if you want confirm that number causes the leak or not else let's close this PR

@twoeths
Copy link
Contributor

twoeths commented Oct 5, 2022

closing this PR as it still has a memory issue after I tried numbered FastMsgIdFn version (200ms - 300ms - this branch vs 1.5s - 2s - unstable submitAttesttionPool request time), at least it proves how well we can improve I/O lag issue at 1/3 of slot by improving fastMsgIdFn

@twoeths twoeths closed this Oct 5, 2022
@twoeths twoeths deleted the dapplion/gossip-fast-msg-id branch October 5, 2022 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve fastMsgIdFn
2 participants