Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reprocess unknown block root attestations #3564

Merged
merged 2 commits into from
Jan 17, 2022

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Jan 4, 2022

Motivation

Right now when attestations come before the block, we ignore attestations. We should follow other clients to queue and reprocess those attestations when the block becomes known. This:

  • May help us with the missing attestation issue [PROD] Validator miss attestation #3527, right now we swallow unknown block root attestations
  • Potentially increase gossip peer score after we propagating those attestations

Description
Since we used to have performance issue with this reprocess work, the approach is quite conservative

  • New ReprocessController to wait for the block, add awaiting promises to awaitingPromisesByRootBySlot
  • Prune awaitingPromisesByRootBySlot per clock slot since we don't want to reprocess old slot attestations
  • If block comes right close to next slot, do not reprocess as the node may be struggling to resync, see REPROCESS_MIN_TIME_TO_NEXT_SLOT_SEC constant
  • New related metrics for this reprocess work

Closes #3560

Some considerations

  • Ideally we should have 1 separate queue for reprocess work but since we have to hook to the life cycle of js-libp2p-gossipsub, I do this reprocess in the gossip handler
  • I increase attestation & AggregateAndProof gossip queue since we use single queue for this reprocess and Lighthouse has 2 separate queues, does this make sense? The test does not show any differences if I increase this

Test

I don't see any big differences with the metrics, except for the peer count with subscribeAllSubnets flag => peer count is the same after running 2 instances for 9 days. Although total attestations reprocessed per slot is only around 20 - 60/slot, the total attestations accepted is a big difference after I leave 2 instances (contabo 19 vs cotabo 20) for 9 days
Screen Shot 2022-01-13 at 10 15 24

  • Master: Gossip aggregate and proofs accepted per slot

Screen Shot 2022-01-13 at 09 52 48

  • This branch: Gossip aggregate and proofs accepted per slot is higher

Screen Shot 2022-01-13 at 09 53 31

  • Master: Gossip beacon attestations accepted per slot

Screen Shot 2022-01-13 at 09 58 50

  • This branch: Gossip beacon attestations accepted per slot is higher

Screen Shot 2022-01-13 at 09 59 12

@codecov
Copy link

codecov bot commented Jan 4, 2022

Codecov Report

Merging #3564 (fbad895) into master (1bd730a) will decrease coverage by 0.22%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3564      +/-   ##
==========================================
- Coverage   37.40%   37.17%   -0.23%     
==========================================
  Files         311      312       +1     
  Lines        8374     8452      +78     
  Branches     1299     1317      +18     
==========================================
+ Hits         3132     3142      +10     
- Misses       5093     5161      +68     
  Partials      149      149              

@codeclimate
Copy link

codeclimate bot commented Jan 4, 2022

Code Climate has analyzed commit fbad895 and detected 3 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 3

View more on Code Climate.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: d635395 Previous: 1bd730a Ratio
BeaconState.hashTreeRoot - No change 567.00 ns/op 489.00 ns/op 1.16
BeaconState.hashTreeRoot - 1 full validator 154.06 us/op 113.05 us/op 1.36
BeaconState.hashTreeRoot - 32 full validator 2.3454 ms/op 1.6612 ms/op 1.41
BeaconState.hashTreeRoot - 512 full validator 30.579 ms/op 22.235 ms/op 1.38
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 152.07 us/op 111.07 us/op 1.37
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 2.4741 ms/op 1.8161 ms/op 1.36
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 34.679 ms/op 24.740 ms/op 1.40
BeaconState.hashTreeRoot - 1 balances 110.32 us/op 82.062 us/op 1.34
BeaconState.hashTreeRoot - 32 balances 935.83 us/op 683.30 us/op 1.37
BeaconState.hashTreeRoot - 512 balances 8.8361 ms/op 6.6580 ms/op 1.33
BeaconState.hashTreeRoot - 250000 balances 159.10 ms/op 130.69 ms/op 1.22
processSlot - 1 slots 128.05 us/op 42.749 us/op 3.00
processSlot - 32 slots 3.6166 ms/op 2.5287 ms/op 1.43
getCommitteeAssignments - req 1 vs - 250000 vc 4.9883 ms/op 4.7354 ms/op 1.05
getCommitteeAssignments - req 100 vs - 250000 vc 6.8478 ms/op 6.5819 ms/op 1.04
getCommitteeAssignments - req 1000 vs - 250000 vc 7.4367 ms/op 7.0608 ms/op 1.05
computeProposers - vc 250000 23.608 ms/op 18.346 ms/op 1.29
computeEpochShuffling - vc 250000 185.67 ms/op 165.84 ms/op 1.12
getNextSyncCommittee - vc 250000 380.30 ms/op 316.63 ms/op 1.20
altair processAttestation - 250000 vs - 7PWei normalcase 49.029 ms/op 48.067 ms/op 1.02
altair processAttestation - 250000 vs - 7PWei worstcase 50.407 ms/op 43.657 ms/op 1.15
altair processAttestation - setStatus - 1/6 committees join 14.706 ms/op 9.8596 ms/op 1.49
altair processAttestation - setStatus - 1/3 committees join 27.489 ms/op 21.222 ms/op 1.30
altair processAttestation - setStatus - 1/2 committees join 47.057 ms/op 33.184 ms/op 1.42
altair processAttestation - setStatus - 2/3 committees join 57.607 ms/op 43.943 ms/op 1.31
altair processAttestation - setStatus - 4/5 committees join 70.521 ms/op 51.715 ms/op 1.36
altair processAttestation - setStatus - 100% committees join 88.760 ms/op 65.338 ms/op 1.36
altair processAttestation - updateEpochParticipants - 1/6 committees join 15.694 ms/op 11.700 ms/op 1.34
altair processAttestation - updateEpochParticipants - 1/3 committees join 36.445 ms/op 20.190 ms/op 1.81
altair processAttestation - updateEpochParticipants - 1/2 committees join 26.526 ms/op 74.422 ms/op 0.36
altair processAttestation - updateEpochParticipants - 2/3 committees join 34.079 ms/op 24.883 ms/op 1.37
altair processAttestation - updateEpochParticipants - 4/5 committees join 30.411 ms/op 25.706 ms/op 1.18
altair processAttestation - updateEpochParticipants - 100% committees join 31.751 ms/op 27.418 ms/op 1.16
altair processAttestation - updateAllStatus 28.003 ms/op 20.576 ms/op 1.36
altair processBlock - 250000 vs - 7PWei normalcase 47.223 ms/op 43.541 ms/op 1.08
altair processBlock - 250000 vs - 7PWei worstcase 135.41 ms/op 107.79 ms/op 1.26
altair processEpoch - mainnet_e81889 1.0806 s/op 981.54 ms/op 1.10
mainnet_e81889 - altair beforeProcessEpoch 295.06 ms/op 263.85 ms/op 1.12
mainnet_e81889 - altair processJustificationAndFinalization 113.08 us/op 102.17 us/op 1.11
mainnet_e81889 - altair processInactivityUpdates 17.606 ms/op 16.885 ms/op 1.04
mainnet_e81889 - altair processRewardsAndPenalties 147.90 ms/op 169.52 ms/op 0.87
mainnet_e81889 - altair processRegistryUpdates 21.313 us/op 9.8580 us/op 2.16
mainnet_e81889 - altair processSlashings 6.3790 us/op 1.9400 us/op 3.29
mainnet_e81889 - altair processEth1DataReset 6.8290 us/op 1.8930 us/op 3.61
mainnet_e81889 - altair processEffectiveBalanceUpdates 13.127 ms/op 10.354 ms/op 1.27
mainnet_e81889 - altair processSlashingsReset 38.008 us/op 14.625 us/op 2.60
mainnet_e81889 - altair processRandaoMixesReset 45.607 us/op 13.982 us/op 3.26
mainnet_e81889 - altair processHistoricalRootsUpdate 6.8950 us/op 1.1920 us/op 5.78
mainnet_e81889 - altair processParticipationFlagUpdates 181.82 ms/op 96.192 ms/op 1.89
mainnet_e81889 - altair processSyncCommitteeUpdates 5.0960 us/op 1.7250 us/op 2.95
mainnet_e81889 - altair afterProcessEpoch 229.11 ms/op 197.94 ms/op 1.16
altair processInactivityUpdates - 250000 normalcase 84.557 ms/op 62.279 ms/op 1.36
altair processInactivityUpdates - 250000 worstcase 86.293 ms/op 64.489 ms/op 1.34
altair processParticipationFlagUpdates - 250000 anycase 99.965 ms/op 88.480 ms/op 1.13
altair processRewardsAndPenalties - 250000 normalcase 146.66 ms/op 160.61 ms/op 0.91
altair processRewardsAndPenalties - 250000 worstcase 163.32 ms/op 115.26 ms/op 1.42
altair processSyncCommitteeUpdates - 250000 466.47 ms/op 313.08 ms/op 1.49
Tree 40 250000 create 1.1827 s/op 635.17 ms/op 1.86
Tree 40 250000 get(125000) 336.81 ns/op 291.64 ns/op 1.15
Tree 40 250000 set(125000) 2.3149 us/op 2.0900 us/op 1.11
Tree 40 250000 toArray() 44.396 ms/op 36.615 ms/op 1.21
Tree 40 250000 iterate all - toArray() + loop 53.642 ms/op 37.244 ms/op 1.44
Tree 40 250000 iterate all - get(i) 125.34 ms/op 107.64 ms/op 1.16
MutableVector 250000 create 27.110 ms/op 19.215 ms/op 1.41
MutableVector 250000 get(125000) 15.396 ns/op 11.885 ns/op 1.30
MutableVector 250000 set(125000) 697.64 ns/op 516.45 ns/op 1.35
MutableVector 250000 toArray() 9.1760 ms/op 7.6450 ms/op 1.20
MutableVector 250000 iterate all - toArray() + loop 9.3405 ms/op 9.0186 ms/op 1.04
MutableVector 250000 iterate all - get(i) 4.6061 ms/op 2.8673 ms/op 1.61
Array 250000 create 5.5860 ms/op 4.7494 ms/op 1.18
Array 250000 clone - spread 2.6111 ms/op 1.6942 ms/op 1.54
Array 250000 get(125000) 1.2950 ns/op 0.81400 ns/op 1.59
Array 250000 set(125000) 1.1990 ns/op 0.81000 ns/op 1.48
Array 250000 iterate all - loop 137.30 us/op 146.88 us/op 0.93
aggregationBits - 2048 els - readonlyValues 270.00 us/op 194.50 us/op 1.39
aggregationBits - 2048 els - zipIndexesInBitList 55.067 us/op 38.746 us/op 1.42
regular array get 100000 times 55.893 us/op 60.588 us/op 0.92
wrappedArray get 100000 times 54.115 us/op 59.104 us/op 0.92
arrayWithProxy get 100000 times 32.434 ms/op 24.081 ms/op 1.35
ssz.Root.equals 1.2580 us/op 945.00 ns/op 1.33
ssz.Root.equals with valueOf() 1.6710 us/op 1.1140 us/op 1.50
byteArrayEquals with valueOf() 1.6490 us/op 1.0660 us/op 1.55
phase0 processBlock - 250000 vs - 7PWei normalcase 13.531 ms/op 9.1443 ms/op 1.48
phase0 processBlock - 250000 vs - 7PWei worstcase 100.53 ms/op 68.878 ms/op 1.46
phase0 afterProcessEpoch - 250000 vs - 7PWei 217.36 ms/op 185.73 ms/op 1.17
phase0 beforeProcessEpoch - 250000 vs - 7PWei 731.72 ms/op 511.99 ms/op 1.43
phase0 processEpoch - mainnet_e58758 907.55 ms/op 738.30 ms/op 1.23
mainnet_e58758 - phase0 beforeProcessEpoch 519.82 ms/op 406.31 ms/op 1.28
mainnet_e58758 - phase0 processJustificationAndFinalization 111.66 us/op 116.10 us/op 0.96
mainnet_e58758 - phase0 processRewardsAndPenalties 135.32 ms/op 97.549 ms/op 1.39
mainnet_e58758 - phase0 processRegistryUpdates 115.46 us/op 62.719 us/op 1.84
mainnet_e58758 - phase0 processSlashings 6.5940 us/op 1.9920 us/op 3.31
mainnet_e58758 - phase0 processEth1DataReset 5.2630 us/op 2.1140 us/op 2.49
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 10.970 ms/op 8.5280 ms/op 1.29
mainnet_e58758 - phase0 processSlashingsReset 29.991 us/op 12.673 us/op 2.37
mainnet_e58758 - phase0 processRandaoMixesReset 36.332 us/op 18.716 us/op 1.94
mainnet_e58758 - phase0 processHistoricalRootsUpdate 7.4080 us/op 2.1230 us/op 3.49
mainnet_e58758 - phase0 processParticipationRecordUpdates 27.769 us/op 16.497 us/op 1.68
mainnet_e58758 - phase0 afterProcessEpoch 191.13 ms/op 161.65 ms/op 1.18
phase0 processEffectiveBalanceUpdates - 250000 normalcase 12.616 ms/op 9.4338 ms/op 1.34
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.6903 s/op 1.1935 s/op 1.42
phase0 processRegistryUpdates - 250000 normalcase 88.803 us/op 53.915 us/op 1.65
phase0 processRegistryUpdates - 250000 badcase_full_deposits 4.1041 ms/op 2.7833 ms/op 1.47
phase0 processRegistryUpdates - 250000 worstcase 0.5 2.3961 s/op 1.3599 s/op 1.76
phase0 getAttestationDeltas - 250000 normalcase 41.940 ms/op 30.838 ms/op 1.36
phase0 getAttestationDeltas - 250000 worstcase 38.737 ms/op 31.248 ms/op 1.24
phase0 processSlashings - 250000 worstcase 42.720 ms/op 35.077 ms/op 1.22
shuffle list - 16384 els 13.544 ms/op 11.277 ms/op 1.20
shuffle list - 250000 els 189.55 ms/op 162.87 ms/op 1.16
getEffectiveBalances - 250000 vs - 7PWei 11.312 ms/op 9.6669 ms/op 1.17
pass gossip attestations to forkchoice per slot 18.657 ms/op 15.148 ms/op 1.23
computeDeltas 4.1446 ms/op 3.3280 ms/op 1.25
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 3.0937 ms/op 1.9900 ms/op 1.55
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 754.76 us/op 607.62 us/op 1.24
BLS verify - blst-native 2.1680 ms/op 1.6172 ms/op 1.34
BLS verifyMultipleSignatures 3 - blst-native 4.4335 ms/op 3.3096 ms/op 1.34
BLS verifyMultipleSignatures 8 - blst-native 9.8506 ms/op 7.1830 ms/op 1.37
BLS verifyMultipleSignatures 32 - blst-native 36.390 ms/op 26.098 ms/op 1.39
BLS aggregatePubkeys 32 - blst-native 48.818 us/op 34.130 us/op 1.43
BLS aggregatePubkeys 128 - blst-native 184.32 us/op 134.72 us/op 1.37
getAttestationsForBlock 98.060 ms/op 84.195 ms/op 1.16
CheckpointStateCache - add get delete 24.471 us/op 16.435 us/op 1.49
validate gossip signedAggregateAndProof - struct 5.1383 ms/op 3.8632 ms/op 1.33
validate gossip signedAggregateAndProof - treeBacked 5.2438 ms/op 3.8511 ms/op 1.36
validate gossip attestation - struct 2.5488 ms/op 1.8105 ms/op 1.41
validate gossip attestation - treeBacked 2.6103 ms/op 1.8440 ms/op 1.42
bytes32 toHexString 2.2180 us/op 1.4860 us/op 1.49
bytes32 Buffer.toString(hex) 776.00 ns/op 590.00 ns/op 1.32
bytes32 Buffer.toString(hex) from Uint8Array 1.1080 us/op 761.00 ns/op 1.46
bytes32 Buffer.toString(hex) + 0x 843.00 ns/op 599.00 ns/op 1.41
Object access 1 prop 0.42200 ns/op 0.28900 ns/op 1.46
Map access 1 prop 0.33000 ns/op 0.25100 ns/op 1.31
Object get x1000 16.863 ns/op 15.619 ns/op 1.08
Map get x1000 1.0180 ns/op 0.86500 ns/op 1.18
Object set x1000 117.30 ns/op 93.526 ns/op 1.25
Map set x1000 80.921 ns/op 55.827 ns/op 1.45
Return object 10000 times 0.39850 ns/op 0.32350 ns/op 1.23
Throw Error 10000 times 6.9306 us/op 5.2463 us/op 1.32
enrSubnets - fastDeserialize 64 bits 1.7030 us/op 1.0910 us/op 1.56
enrSubnets - ssz BitVector 64 bits 18.985 us/op 14.713 us/op 1.29
enrSubnets - fastDeserialize 4 bits 514.00 ns/op 377.00 ns/op 1.36
enrSubnets - ssz BitVector 4 bits 3.4410 us/op 2.4650 us/op 1.40
RateTracker 1000000 limit, 1 obj count per request 199.14 ns/op 154.57 ns/op 1.29
RateTracker 1000000 limit, 2 obj count per request 155.85 ns/op 115.46 ns/op 1.35
RateTracker 1000000 limit, 4 obj count per request 133.64 ns/op 94.555 ns/op 1.41
RateTracker 1000000 limit, 8 obj count per request 115.37 ns/op 84.281 ns/op 1.37
RateTracker with prune 5.5180 us/op 3.4510 us/op 1.60

by benchmarkbot/action

@twoeths twoeths marked this pull request as ready for review January 4, 2022 14:41
@twoeths twoeths marked this pull request as draft January 8, 2022 11:24
@twoeths twoeths marked this pull request as ready for review January 13, 2022 03:19
@twoeths
Copy link
Contributor Author

twoeths commented Jan 13, 2022

thanks to the more gossip attestations we accepted, average peer score is an improvement too

  • master

Screen Shot 2022-01-13 at 10 38 42

  • this branch

Screen Shot 2022-01-13 at 10 39 02

@dapplion
Copy link
Contributor

@tuyennhv I've changed the design a bit as in my opinion make the change more manageable:

  • I've moved the re-process waiting logic into the validation function itself. With that the logic doesn't have to be repeated and affects attestation, aggregate gossip validation and API validation.
  • I've changed how the reprocess handles each case, making it always resolve instead of reject. That makes it much simpler to handle all cases:
    • if block found: resolve true
    • if queue full: resolve false
    • if timeout: resolve false

Please review my changes and keep them only if you agree. Please make sure I haven't broken any assumptions from our previous design

* reprocess attestations, it should control when the attestations are ready to reprocess instead.
*/
export class ReprocessController {
private readonly awaitingPromisesByRootBySlot: MapDef<Slot, MapDef<RootHex, AwaitingAttestationPromise[]>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make do with MapDef<Slot, MapDef<RootHex, AwaitingAttestationPromise>>
Reusing the same promise for attestations relying on the same block.
We would just be tracking the maximum wait time before resolve/reject per block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Implemented 👍

@dapplion
Copy link
Contributor

@tuyennhv I've moved back the code to handlers to implemented #3613 in the future without nasty stuff. I think it looks decently clean now and allows for extension here:

// Trigger unknown block root search here

@twoeths
Copy link
Contributor Author

twoeths commented Jan 14, 2022

thanks @dapplion , it looks great now. The latest update works well on contabo-20 👍

wemeetagain
wemeetagain previously approved these changes Jan 14, 2022
dapplion
dapplion previously approved these changes Jan 14, 2022
@dapplion
Copy link
Contributor

@tuyennhv File conflicts from pass attestations to the forkchoice

@twoeths twoeths dismissed stale reviews from dapplion and wemeetagain via fbad895 January 16, 2022 01:28
@twoeths twoeths force-pushed the tuyen/reprocess-attestations branch from c264a3c to fbad895 Compare January 16, 2022 01:28
Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@dapplion dapplion merged commit 5295a78 into master Jan 17, 2022
@dapplion dapplion deleted the tuyen/reprocess-attestations branch January 17, 2022 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reprocess attestations
3 participants