-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Networking] Optimizing GossipSub RPC Handling Memory Usage Through Asynchronous Subscription Updates #4988
[Networking] Optimizing GossipSub RPC Handling Memory Usage Through Asynchronous Subscription Updates #4988
Conversation
This reverts commit 9b1a4cc.
…ry-intensive-issues-part-1
SubscriptionProviderConfig SubscriptionProviderParameters `mapstructure:",squash"` | ||
} | ||
|
||
type SubscriptionProviderParameters struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there concrete minimum values we can use for the validate tags of this struct? For SubscriptionUpdateInterval perhaps we can use 10m as minimum, and CacheSize the current amount of nodes on the current network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input. Setting a fixed minimum seems risky. Presently, validation mandates parameters to be greater than zero, ensuring they're actively configured. Allowing node operators to adjust parameters to minimal values as necessary seems more prudent.
|
||
type SubscriptionRecordCache struct { | ||
c *stdmap.Backend | ||
currentCycle atomic.Uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation describing the relationship between the currentCycle
and how it is moved during each updateTopics time interval which is configured. It's sort of implied here but a short concise description would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few small comments otherwise looks great !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. added a few small comments
network/p2p/scoring/registry.go
Outdated
case <-reg.validator.Ready(): | ||
reg.logger.Info().Msg("subscription validator started") | ||
} | ||
ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the worker isn't ready()
unless the validator is
case <-reg.validator.Ready(): | |
reg.logger.Info().Msg("subscription validator started") | |
} | |
ready() | |
case <-reg.validator.Ready(): | |
reg.logger.Info().Msg("subscription validator started") | |
ready() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network/p2p/scoring/score_option.go
Outdated
case <-scoreRegistry.Ready(): | ||
s.logger.Info().Msg("score registry started") | ||
} | ||
ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
case <-scoreRegistry.Ready(): | |
s.logger.Info().Msg("score registry started") | |
} | |
ready() | |
case <-scoreRegistry.Ready(): | |
s.logger.Info().Msg("score registry started") | |
ready() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updatedTopics, err := s.cache.AddTopicForPeer(p, topic) | ||
if err != nil { | ||
// this is an irrecoverable error; hence, we crash the node. | ||
ctx.Throw(fmt.Errorf("failed to update topics for peer %s: %w", p, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rather than passing the irrecoverable.SignalerContext
, have this return an error, and the loop can throw if an error is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddWorker(func(ctx irrecoverable.SignalerContext, ready component.ReadyFunc) { | ||
logger.Debug().Msg("starting subscription validator") | ||
v.subscriptionProvider.Start(ctx) | ||
<-v.subscriptionProvider.Ready() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this component become ready quickly? if not, it's probably worth using a select with the context to allow graceful aborts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Khalil Claybon <khalil.claybon@dapperlabs.com>
…b.com:onflow/flow-go into yahya/6870-fix-memory-intensive-issues-part-1
…b.com:onflow/flow-go into yahya/6870-fix-memory-intensive-issues-part-1
Problem Statement:
Our current GossipSub implementation encounters significant memory overhead due to the frequent updates triggered by the peer-scoring component. This component, integral for verifying peer subscriptions, becomes increasingly memory-intensive with rising RPC traffic. Each update renews the router’s topic list and peers' roster, and while individually lightweight, these updates cumulatively strain the system. This issue was highlighted in a flame graph analysis, pointing to a need for optimization.
Read more here: https://github.com/dapperlabs/flow-go/issues/6870
Proposed Solution:
This PR introduces a major optimization to the subscription update mechanism within the GossipSub router. We propose shifting from an event-based update system to a time-based approach. Specifically, this involves:
sync.Map
for monitoring subscriptions, aiming to reduce heap allocations and improve overall performance.Key Features: