Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics and revamp logic #206

Merged
merged 50 commits into from
Apr 5, 2022
Merged

Add metrics and revamp logic #206

merged 50 commits into from
Apr 5, 2022

Conversation

dapplion
Copy link
Contributor

@dapplion dapplion commented Mar 15, 2022

  • Add metrics with full coverage of all critical logic
  • Don't apply penalties for long validation times
  • Add async flow for slow validation

Closes #194

@codecov-commenter
Copy link

codecov-commenter commented Mar 15, 2022

Codecov Report

Merging #206 (b8693c5) into master (90c50b5) will decrease coverage by 7.29%.
The diff coverage is 71.78%.

@@            Coverage Diff             @@
##           master     #206      +/-   ##
==========================================
- Coverage   86.28%   78.98%   -7.30%     
==========================================
  Files          19       24       +5     
  Lines        5452     7696    +2244     
  Branches      621      760     +139     
==========================================
+ Hits         4704     6079    +1375     
- Misses        748     1617     +869     
Impacted Files Coverage Δ
ts/metrics.ts 23.07% <23.07%> (ø)
ts/message/rpc.js 76.77% <50.00%> (+0.27%) ⬆️
ts/tracer.ts 83.13% <72.16%> (-16.87%) ⬇️
ts/utils/buildRawMessage.ts 73.91% <73.91%> (ø)
ts/message-cache.ts 84.11% <76.85%> (-13.80%) ⬇️
ts/utils/publishConfig.ts 77.50% <77.50%> (ø)
ts/utils/time-cache.ts 96.61% <77.77%> (-3.39%) ⬇️
ts/score/scoreMetrics.ts 78.94% <78.94%> (ø)
ts/index.ts 87.59% <86.05%> (-4.90%) ⬇️
ts/score/compute-score.ts 92.85% <87.50%> (-1.65%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90c50b5...b8693c5. Read the comment docs.

@dapplion
Copy link
Contributor Author

Update on gossipsub debugging

Low scores are caused by P3 and P7 penalties

  • P7: A penalty of 0 is applied if the promise is resolved before 3 seconds. A penalty of 1 is applied if the promise is resolved after 3 seconds. I've added a separate promise tracker up to 12 seconds to check if promises are delivered late or not at all and it seems most of the promises counted as broken simply arrive late. As noted here https://medium.com/airtable-eng/investigating-node-js-performance-event-loop-and-network-i-o-part-2-e9d1a8d4da8a NodeJS suffers from I/O lag when under heavy load, which could be the cause of the delays. I'll do more tests to confirm this hypothesis. If true:
    • We should increase the promise timeout while our I/O is slow.
  • P3: Sometimes extremely large penalties are applied for not delivering enough messages. The maximum P3 penalty for beacon_attestation is 64 * weight * (threshold - 0) ^ 2 which is roughly 50000. I've seen lower values than this, so there may be a logic error somewhere. Also it seems like nodes that don't contribute to a mesh at all are still expect to meet the threshold which they don't. Still investigating.

dapplion and others added 6 commits March 18, 2022 09:24
* Add checkReceivedSubscriptions

* Add checkReceivedSubscriptions to 'test gossipsub multihops'

* Add checkReceivedSubscriptions to 'test gossipsub tree topology'

* test gossipsub star topology with signed peer records

* Fix 'test gossipsub direct peers'

* Fix 'test gossipsub flood publish'

* Fix 'test gossipsub star topology with signed peer records'

* 'direct peers' test: wait for subscriptions event again

* 'direct peer': await for 2 peer:connect events

* 'direct peers': add missing Promise.all

* Expect topic peers to contain peer id
@twoeths twoeths mentioned this pull request Mar 25, 2022
@twoeths twoeths changed the title [WIP] Add metrics and revamp logic Add metrics and revamp logic Mar 25, 2022
Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks good to me.

Curious about the status of this regarding the many TODOs

@BigLep
Copy link

BigLep commented Apr 4, 2022

Hello, I'm one of the engineering managers at PL EngRes. I was wondering if there's a estimated completion date for this work? I'm asking because we're waiting on it before doing the next js-libp2p release with the TypeScript refactor (libp2p/js-libp2p#1178 ).

@dapplion
Copy link
Contributor Author

dapplion commented Apr 4, 2022

Hello, I'm one of the engineering managers at PL EngRes. I was wondering if there's a estimated completion date for this work? I'm asking because we're waiting on it before doing the next js-libp2p release with the TypeScript refactor (libp2p/js-libp2p#1178 ).

It may take some time since we are continually iterating changes here and in Lodestar until we have a peering performance level that good enough.

If this is a blocker we can port the MVP changes to master and do a faster release that does not include this work. The blocker is moving to ESM only (if yes see ChainSafe/lodestar#3863) or what specifically?

@BigLep
Copy link

BigLep commented Apr 4, 2022

@dapplion : thanks for the response. I'll let @achingbrain speak to the minimum he needs to be unblocked.

@achingbrain : can you please provide input?

@achingbrain
Copy link
Collaborator

The @libp2p/* deps are ESM-only so they will need to be imported rather than required, that means the compilation result of this module will need to be ESM as well, since otherwise it'll use require in the output which will then fail at runtime.

When consuming this module though, you can use dynamic imports to load it from CJS or ESM, so maybe it's not a complete blocker? You will have to do that for all new @libp2p/* deps, but it may have a comparatively smaller impact on the lodestar codebase than going full ESM immediately?

@twoeths twoeths merged commit baf84f8 into master Apr 5, 2022
@twoeths twoeths deleted the dapplion/gossipsub branch April 5, 2022 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics to gossipsub
6 participants