-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix pubsub new_blocks notifications to include all blocks #9987
Fix pubsub new_blocks notifications to include all blocks #9987
Conversation
It looks like @mattrutherford signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That second if is_empty
is indeed redundant, but I don't think the check !imported_blocks.is_empty() && is_empty
is duplicated. The first one checks whether the imported block list is empty, and the second one checks whether the remaining queue is empty. In this PR, the behavior for determine whether the client is syncing is changed via:
- Previously, we check whether the block queue is empty after import
max_round_blocks_to_import
blocks. If not, we say that the client is syncing. - In this PR, we check whether the block queue has a length of less than 10 after import
max_round_blocks_to_import
blocks. If not, we say that the client is syncing.
However, as you can see, the changed behavior in this PR is mostly identical to setting --max-round-blocks-to-import
config to a higher number.
To fix the reported issue, maybe we can just change the default of --max-round-blocks-to-import
to a higher number like 22
? I think the previous definition of "syncing" is nicer because it avoids an extra parameter.
ethcore/src/client/client.rs
Outdated
duration, | ||
); | ||
}); | ||
if client.queue_info().total_queue_size() < 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether calling queue_info
here is expensive. We fetch many other not-used info by this and it acquires three atomics. And I suppose this function is something that needs to be really fast, so it may matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that it includes a bunch of stuff that we don't need. It currently requires 3 Mutex locks - I have a separate suggestion for a way to negate the need to lock WIP, which could mitigate risk here.
ethcore/src/client/client.rs
Outdated
duration, | ||
); | ||
}); | ||
if client.queue_info().total_queue_size() < 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hard-coding of 10
won't fit slower network conditions. Can we parameterize this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean 10 might be too high? Maybe checking for distance from most recent network block might be a better way to do this? Do we have a reference for that?
I don't think increasing |
@mattrutherford Do you mind to explain? I still don't see why we would get different results compared with increasing |
I've put 2 printlns in the code to illustrate( first at the top of the fn and again at the bottom), so you can see against the logs and compare with the output of the nodejs test script .
|
@mattrutherford From the log you provided, isn't it the case that when From the code, it looks like we add items to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not trigger the notification always and let the receives decide when to handle it?
I decided to take the suggestion by @tomusdrw because I think it's pragmatic to let each implementation have the information and decide what appropriate action to take. All existing functionality (except for the updated pubsub impl) remains as before. The current solution has been tested over a 48hr period and appears stable with no missed notifications, previously missing a block notification every ~10 min or so. I think we should open a new issue to examine the way the queue works and decide if it's working as expected. Another note - it may be possible to optimise |
ddf19da
to
d78639c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Left a small grumble.
Note that this does change behavior of RPC pubsub -- before we don't report blocks if we're syncing, but right now we do. Fortunately our RPC specs don't have any restrictions on that.
Also (I think totally optional before merging this PR, but nice to have) it may be good if we can test syncing older blocks with this branch, just to make sure performance is okay.
_proposed: Vec<Bytes>, | ||
_duration: Duration, | ||
) { | ||
fn new_blocks(&self, new_blocks: NewBlocks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a check below this line that if we have no log_subscribers and no head_subscribers then early return? We do init some structs below regardless of whether there're subscribes. They may make normal syncing slow.
@mattrutherford Also please update PR description to make @5chdn's job easier! |
proposed_blocks.clone(), | ||
duration, | ||
NewBlocks::new( | ||
imported_blocks.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass references here instead of cloning the Vec
every time?
Or maybe better pass Arc<NewBlocks>
to avoid cloning the content if reference is not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking about too (comment above). There might be some more improvements to make, but I was wondering where to draw the line with this PR. I think this is worth doing this however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a separate PR improving this, just make sure to create an issue for this and add patch
labels.
Remove load_is_empty
Implement new struct to pass to new_blocks() with extra parameter - processing_is_empty, which was previously used to determine whether the notificaiton should be sent. Now it's up to each implementation to decide. Updated all implementations to behave as before, except eth_pubsub, which ignores processing_is_empty. Update original tests for new_blocks
725b979
to
23cd3e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small suggestion on the processing_is_empty
bool variable, but it's not critical.
_proposed: Vec<Bytes>, | ||
_duration: Duration, | ||
) { | ||
fn new_blocks( &self, _new_blocks: NewBlocks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn new_blocks( &self, _new_blocks: NewBlocks) { | |
fn new_blocks(&self, _new_blocks: NewBlocks) { |
Fix: new blocks notifications sometimes missing in pubsub RPC Implement new struct to pass to `new_blocks()` with extra parameter - `has_more_blocks_to_import`, which was previously used to determine whether the notification should be sent. Now it's up to each implementation to decide what to do. Updated all implementations to behave as before, except `eth_pubsub`, which will send notification even when the queue is not empty. Update tests.
* version: bump beta to v2.2.6 * Update pwasm-utils to 0.6.1 (#10134) * version: mark upgrade critical on kovan * Identity fix (#10128) * fix #10125 fix service transaction version detection if --identity is enabled, change test to match how --identity actually works * fix wrong var * get the index of v, not / * idx, not idx.len() * Update ethcore/sync/src/chain/propagator.rs Co-Authored-By: joshua-mir <43032097+joshua-mir@users.noreply.github.com> * Update ethcore/sync/src/chain/propagator.rs Co-Authored-By: joshua-mir <43032097+joshua-mir@users.noreply.github.com> * change version prefix to a const * space Co-Authored-By: joshua-mir <43032097+joshua-mir@users.noreply.github.com> * [Hit return to continue] * ethcore: update hardcoded headers for foundation * ethcore: update hardcoded headers for ropsten * ethcore: update hardcoded headers for kovan * ethcore: use consistant formatting * ethcore: restore spaces after colon in chain spec * ethcore: fix bootnodes in chain specs * ethcore: fix bootnodes in chain specs * ethcore: enforce newline at the end of chainspecs * Follow-up to #10105 (#10107) * HF in POA Sokol (2019-01-04) (#10077) * HF in POA Sokol (2019-01-04) poanetwork/poa-chain-spec#91 * Update POA Core bootnodes * Autogen docs for the "Configuring Parity Ethereum" wiki page. (#10067) * publish docs changes for autogen config docs * Update publish-docs.sh adding an environment variable so js knows not to download git master and just grab the local repo * Update publish-docs.sh made some changes making this unnecessary * fix env variable env variable passes to node properly now * use yarn * test pipeline, revert me * fix test pipeline, revert me * change runner tag * change runner tag 2 * change runner tag * global git config * supress upload_files output * Update .gitlab-ci.yml reverting testing changes * Replace tag if exists Very unlikely to be important/useful * Autogen docs for the "Configuring Parity Ethereum" wiki page. (#10067) * publish docs changes for autogen config docs * Update publish-docs.sh adding an environment variable so js knows not to download git master and just grab the local repo * Update publish-docs.sh made some changes making this unnecessary * fix env variable env variable passes to node properly now * use yarn * test pipeline, revert me * fix test pipeline, revert me * change runner tag * change runner tag 2 * change runner tag * global git config * supress upload_files output * Update .gitlab-ci.yml reverting testing changes * Replace tag if exists Very unlikely to be important/useful * finality: dont require chain head to be in the chain (#10054) * Fill transaction hash on ethGetLog of light client. (#9938) * Fill transaction hash on ethGetLog of light client. This is enifficient but we keep align with spec. * Using better variables names. * Expose old fast light get log as `parity_getLogsLight`. * Update rpc/src/v1/helpers/light_fetch.rs Fix indent. Co-Authored-By: cheme <emericchevalier.pro@gmail.com> * Factor common code between light_logs and logs. * Remove useless check * Rename parity light logs to 'parity_getLogsNoTransactionHash'. Fix indent and extra white lines. * Use vec instead of tree map to avoid inner function. * better loop * fix pubsub new_blocks notifications to include all blocks (#9987) Fix: new blocks notifications sometimes missing in pubsub RPC Implement new struct to pass to `new_blocks()` with extra parameter - `has_more_blocks_to_import`, which was previously used to determine whether the notification should be sent. Now it's up to each implementation to decide what to do. Updated all implementations to behave as before, except `eth_pubsub`, which will send notification even when the queue is not empty. Update tests. * Revert part of 70a6db7 * HF in POA Core (2019-01-18) - Constantinople (#10155) poanetwork/poa-chain-spec#100 * ci: re-enable snap publishing (#10142) * ci: enable snap publishing~ * ci: add publish snap script * ci: add snapcraft skeleton * ci: group export statements * ci: enable snaps on pr branch * ci: enable snaps on pr branch * ci: set default BUILD_ARCH * ci: enable snaps on pr branch * ci: enable snaps on pr branch * ci: add libdb to snap * ci: reinitiate gitlabci * ci: reinitiate publish-snap script * ci: fix yaml syntax * cargo/gitlab env vars * debug, revert me * version? * debug vars * vars * vars fix * vars fix * revert * Update scripts/gitlab/publish-snap.sh Co-Authored-By: 5chdn <5chdn@users.noreply.github.com> * ci: read track from cargo toml * Make sure parent block is not in importing queue when importing ancient blocks (#10138) * Make sure parent block is not in importing queue when importing ancient blocks * Clear queue when an ancient import fails * Lock only once in clear * Add comments why queued check is needed * Should push the value back to the queue * Directly check in chain.read() * Remove extra empty line * Revert unused verification change
Remove duplicate check for
is_empty
. Check total queue size is < 10 to send new blocks notifications (to indicate not doing major sync). Tried initially with queue size of 3 (as used inpub fn is_major_importing_or_waiting
inrpc/src/v1/helpers/block_import.rs
) but during overnight testing it missed a block; increasing to 10 returned all canonical blocks in the same test setup.Closes 9865