-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use unordered_set / unordered_map where order isn't relevant #6497
perf: use unordered_set / unordered_map where order isn't relevant #6497
Conversation
WalkthroughThe pull request introduces a systematic transformation across multiple source files in the project, focusing on replacing standard The modifications primarily affect files in the The changes are consistent across different components, indicating a deliberate and comprehensive refactoring effort. New headers like The refactoring touches critical areas of the system including quorum connections, relay members, signature share management, and network node tracking, reflecting a holistic approach to data structure optimization in the codebase. Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (7)
🚧 Files skipped from review as they are similar to previous changes (1)
🔇 Additional comments (10)src/llmq/dkgsession.h (2)
The change from
The return type has been properly updated to match the new member variable type while maintaining const correctness. src/llmq/utils.cpp (4)
The return type change to
The local variable type has been properly updated to match the new return type.
The return type change to
The return type has been properly updated and the added comments clearly explain the relay connection algorithm. src/llmq/quorums.cpp (1)
The local variable type has been properly updated to use src/llmq/signing_shares.cpp (1)
Using src/net.h (1)
The changes to use
Also applies to: 1507-1507, 1509-1509, 1818-1819 src/net.cpp (1)
The implementation changes properly handle the transition to unordered containers:
Also applies to: 4619-4619, 4628-4628, 4659-4659, 4672-4672, 4681-4681 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/llmq/quorums.cpp (2)
Line range hint
3725-3725
: **Structured binding usage **
The if-statement with assignment (C++17 structured binding) is clean and concise; just ensure careful handling of potential fallback logic if verification fails.
Line range hint
4662-4662
: **Efficient container initialization **
The construction of the result set is straightforward. Consider reserving space if the maximum expected size of the quorums is known in advance to minimize rehashes.src/net.cpp (2)
3721-3722
: Consider reserving space before inserting into containers.
The use of std::unordered_set and std::unordered_map for storing node and proRegTxHash data is fine here.
As an optional optimization, if you anticipate many nodes, consider calling reserve() on these containers before inserting. This helps reduce potential rehashing costs.Also applies to: 3725-3725
4659-4662
: Returning copies of large containers.
The function returns an entire std::unordered_set by value. This is correct but might have a performance impact for large sets. If necessary, consider returning a reference or a smart pointer, or using a caching mechanism.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/llmq/dkgsession.h
(3 hunks)src/llmq/quorums.cpp
(1 hunks)src/llmq/signing_shares.cpp
(1 hunks)src/llmq/utils.cpp
(3 hunks)src/llmq/utils.h
(2 hunks)src/net.cpp
(6 hunks)src/net.h
(2 hunks)
🔇 Additional comments (20)
src/llmq/quorums.cpp (7)
373-373
: **Switch to unordered_set for efficiency **
Using std::unordered_set rather than std::set allows for average O(1) membership checks, which is beneficial when dealing with frequent insertions/lookups in "connections".
Line range hint 3721-3722
: **Improving lookup speed with hashing **
Both std::unordered_set for CService and std::unordered_map for proTxHash ensure constant-time average lookups, which should reduce overhead when synchronizing or iterating node states.
Line range hint 4619-4619
: **Preserving global consistency **
Methodically storing quorum-node relationships through this method is consistent with changes in the rest of the PR; it centralizes updates in a single place for improved maintainability.
Line range hint 4628-4628
: **Ensures robust relay membership tracking **
Storing a separate map of "relay members" clarifies membership roles and leverages unordered containers for faster lookups and merges.
Line range hint 4659-4659
: **Hash-based set result **
Returning an std::unordered_set<uint256, StaticSaltedHasher> for quorum IDs aligns with the rest of the data structure changes, optimizing membership checks.
Line range hint 4672-4672
: **Consistent container usage for node ID lookups **
Adopting std::unordered_set for fast membership checks complements the broader shift to hashed containers for dynamic node sets.
Line range hint 4681-4681
: **Conditional logic for qwatch nodes **
Ensure that the condition allows qwatch relationships to remain consistent if a node transitions policies. Verify with an integration test that covers the qwatch scenario.
✅ Verification successful
Let me analyze the search results. The qwatch
functionality appears to be a network-level feature related to quorum data sharing and node relationships. Let me gather more specific information about the conditional logic and policy transitions.
Let me try a different approach to understand the qwatch relationships and node state changes.
Let me try one final search to understand the complete context of node state transitions and qwatch handling.
Based on the comprehensive search results, I can now provide a final response about the qwatch node relationships and policy transitions.
**Qwatch relationships are properly handled during node transitions **
The codebase shows robust handling of qwatch relationships:
- Qwatch state is properly managed through
qwatch
flag in the node class (src/net.h) - Node transitions are handled safely:
- Qwatch status is checked before disconnecting nodes (src/masternode/utils.cpp)
- Verified nodes and qwatch nodes are treated distinctly in quorum data requests (src/llmq/quorums.cpp)
- Policy transitions are protected by proper checks of both verification status and qwatch flag:
if (m_mn_activeman == nullptr || (pfrom.GetVerifiedProRegTxHash().IsNull() && !pfrom.qwatch))
- Integration tests exist and verify qwatch scenarios:
- test/functional/p2p_quorum_data.py includes specific qwatch connection tests
- Tests verify both single and multiple qwatch connections sharing request limits
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching for usage of qwatch references in the codebase to ensure consistent handling
rg -A 3 -B 3 "qwatch"
Length of output: 10569
Script:
#!/bin/bash
# Search for policy-related code near qwatch usage
ast-grep --pattern 'if ($_ && $_.qwatch) { $$$ }'
# Look for policy transition code
rg -A 5 -B 5 "transition.*polic" --type cpp
Length of output: 97
Script:
#!/bin/bash
# Search for qwatch state changes and related node handling
rg -A 5 -B 5 "qwatch\s*=\s*(true|false)" --type cpp
# Look for functions that might handle node state changes
ast-grep --pattern 'function $_(CNode& $node) { $$$ }'
Length of output: 768
Script:
#!/bin/bash
# Search for node verification and state handling code
rg -A 5 -B 5 "GetVerifiedProRegTxHash|SetVerifiedProRegTxHash" --type cpp
# Look for watchquorum related functionality
rg -A 5 -B 5 "IsWatchQuorumsEnabled|WatchQuorums" --type cpp
Length of output: 30158
src/llmq/signing_shares.cpp (1)
720-720
: **Switching to unordered_set **
Using a hash set for quorumNodes optimizes join, membership, and iteration patterns compared to std::set, especially for NodeId-based lookups.
src/llmq/utils.h (4)
12-12
: **Salted hasher inclusion **
The addition of saltedhasher.h is consistent with the new hashing approach for unordered containers. This helps reduce potential collisions and maliciously crafted inputs.
17-17
: **unordered_set header inclusion **
Including <unordered_set> is required to support the new set implementations. This is straightforward and correct.
39-39
: **Retuning GetQuorumConnections to return hashing-based container **
Switching to std::unordered_set for connections is well-aligned with large-size usage and frequent membership checks (e.g., verifying presence of members). Great for performance.
40-40
: **Relay members now hashed **
Similar to GetQuorumConnections, using std::unordered_set here lowers overhead for lookups and merges. Implementation is uniform with the rest of the PR changes.
src/llmq/dkgsession.h (1)
298-298
: LGTM: Good performance optimization
Converting relayMembers
from std::set
to std::unordered_set
is a good optimization since the order of relay members is not relevant. This change provides O(1) average lookup time instead of O(log n), and the StaticSaltedHasher
ensures protection against hash collision attacks.
Also applies to: 387-387
src/llmq/utils.cpp (2)
Line range hint 656-677
: LGTM: Optimized GetQuorumConnections return type
The conversion from std::set
to std::unordered_set
for quorum connections is appropriate since the order of connections is not relevant. The function's logic remains correct while gaining O(1) average lookup performance.
680-695
: LGTM: Optimized GetQuorumRelayMembers and calcOutbound
Good optimization of the relay member calculations by using unordered_set. The change maintains the same functionality while improving performance. The early return optimization for single-member quorums is preserved.
src/net.h (2)
1504-1509
: LGTM: Optimized masternode quorum method signatures
Appropriate conversion of method signatures to use std::unordered_set
for quorum-related operations. This change is consistent with the optimization strategy and maintains API compatibility while improving performance.
1818-1819
: LGTM: Optimized masternode quorum member variables
Good optimization of the internal storage for masternode quorum nodes and relay members. Using unordered_set
with StaticSaltedHasher
provides better lookup performance while maintaining security against hash collision attacks.
src/net.cpp (3)
4619-4619
: Check concurrency for concurrent calls.
This setter method for masternode quorum nodes is simple and uses a lock appropriately. Ensure that callers consistently acquire the same lock (cs_vPendingMasternodes) for any dependent operations on masternodeQuorumNodes to avoid data races.
4628-4628
: Be mindful of concurrency when mutating node attributes in loop.
While updating masternode-specific data for connected peers, note that pnode->m_masternode_iqr_connection is assigned without explicitly locking the node. If changes to pnode->m_masternode_iqr_connection are frequent or contended, consider adding a lightweight lock or making the field atomic to avoid subtle race conditions.
Line range hint 4672-4681
: Ensure thread-safe access to node attributes within the loop.
While m_nodes_mutex safely guards the node list, fields like pnode->qwatch are being read outside a node-level lock or atomic usage. Though a simple bool read is typically safe, remain consistent with concurrency patterns if additional fields become involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 319e6ea, pls apply clang format suggestions
Did a few quickbench to ensure there isn't some advantage to using the "worse" data structure at small sizes. set: https://quick-bench.com/q/ApPAGguzG2F9AnqM2GkgDAXCzX4 map: https://quick-bench.com/q/PXDdF0HVGpiSmC9Hh5FUSTp0uJU Using unordered_map did show a tiny insertion regression for ~expected sizes on masternodes, but it appears that only 1 access will compensate for that overhead, and each su, some perf results I have are here, show that this std::set usage is causing an imo unacceptably high cpu usage. ``` - 9.18% 0.00% d-mncon dashd [.] CConnman::ThreadOpenMasternodeConnections(CDeterministicMNManager&, CMasternodeMetaMan&, CMasternodeSync&)::{lambda()#▒ - CConnman::ThreadOpenMasternodeConnections(CDeterministicMNManager&, CMasternodeMetaMan&, CMasternodeSync&)::{lambda()#1}::operator()() const ▒ + 6.16% CConnman::IsMasternodeOrDisconnectRequested(CService const&) ▒ - 2.66% std::set<CService, std::less<CService>, std::allocator<CService> >::count(CService const&) const ▒ - std::_Rb_tree<CService, CService, std::_Identity<CService>, std::less<CService>, std::allocator<CService> >::find(CService const&) const ▒ - 2.02% std::_Rb_tree<CService, CService, std::_Identity<CService>, std::less<CService>, std::allocator<CService> >::_M_lower_bound(std::_Rb_tree_node<CService> const▒ + 1.99% std::less<CService>::operator()(CService const&, CService const&) const ▒ + 0.62% std::less<CService>::operator()(CService const&, CService const&) const ```
319e6ea
to
a04d178
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK a04d178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK a04d178
Issue being fixed or feature implemented
We use std::set and std::map where we do not care about order, and it's mostly serves as a lookup. I couldn't find any instances in the code where we actually care about the ordering of these sets or maps.
In this case, we should probably prefer using unordered_set and unordered_map.
What was done?
Converted these sets / maps into their unordered equivalent
.
How Has This Been Tested?
Did a few quickbench to ensure there isn't some significant advantage to using the "worse" data structure at small sizes. these used sizes up to around 1000 instances per map / set, as this is the max number of connects we are every likely to see.
set: https://quick-bench.com/q/ApPAGguzG2F9AnqM2GkgDAXCzX4
map: https://quick-bench.com/q/PXDdF0HVGpiSmC9Hh5FUSTp0uJU
In this we see there is never a reason to use set over unordered_set, assuming that order isn't important.
Using unordered_map did show a small insertion regression for ~expected sizes on masternodes, but it appears that only 1 lookup will compensate for that overhead, and based on the codepaths, I expect more than 1 lookup per insertion.
some perf results I have are here, show that this std::set usage is causing at least a measurable cpu usage.
Breaking Changes
None
Checklist:
Go over all the following points, and put an
x
in all the boxes that apply.