Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Race Condition and incorrectly dropped consensus messages #11082

Closed
dforsten opened this issue Sep 23, 2019 · 3 comments
Closed

Race Condition and incorrectly dropped consensus messages #11082

dforsten opened this issue Sep 23, 2019 · 3 comments
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. Q2-easy 💃 Can be fixed by copy and pasting from StackOverflow.

Comments

@dforsten
Copy link
Contributor

dforsten commented Sep 23, 2019

In ethcore/src/client/client.rs the implementation of the queue() function of IoChannelQueue there is no guarantee that self.currently_queued is incremented before it is being decremented.

This potentially results in an underflow of self.currently_queued and incorrectly dropped consensus messages if new consensus messages are being processed on another thread at the same time:

2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync  0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync  Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa  Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync  0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync  Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa  Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG sync  0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #0 TRACE sync  Received consensus packet from 0
2019-09-19 09:17:54 UTC IO Worker #0 DEBUG poa  Ignoring the message, error queueing: The queue is full (18446744073709551615)
2019-09-19 09:17:54 UTC IO Worker #2 DEBUG sync  0 -> Dispatching packet: 21
2019-09-19 09:17:54 UTC IO Worker #2 TRACE sync  Received consensus packet from 0

We have encountered that issue quite frequently when deploying Honey Badger validators, leading to the Honey Badger Consensus Engine to become stuck in configurations with low node counts.

Removing the queue size check promptly fixed the issue. One possible fix for the underflow is to increment self.currently_queued before calling channel.send(), and decrementing it after channel.send() returns an error.

@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. Q2-easy 💃 Can be fixed by copy and pasting from StackOverflow. labels Sep 23, 2019
@dvdplm
Copy link
Collaborator

dvdplm commented Sep 23, 2019

Thank you for the report. Are you interested in submitting a PR to fix this? :)

@dforsten
Copy link
Contributor Author

I will see to get a PR submitted later today.

dvdplm pushed a commit that referenced this issue Sep 25, 2019
…1086)

Fixes a race condition causing the currently_queued counter to underflow and consensus messages getting dropped incorrectly as a consequence.
@dforsten
Copy link
Contributor Author

Fixed with PR #11086, thank you!

s3krit pushed a commit that referenced this issue Oct 21, 2019
…1086)

Fixes a race condition causing the currently_queued counter to underflow and consensus messages getting dropped incorrectly as a consequence.
@soc1c soc1c mentioned this issue Nov 5, 2019
35 tasks
dvdplm pushed a commit that referenced this issue Nov 6, 2019
…1086)

Fixes a race condition causing the currently_queued counter to underflow and consensus messages getting dropped incorrectly as a consequence.
s3krit pushed a commit that referenced this issue Nov 11, 2019
* [CI] check evmbin build (#11096)
* Correct EIP-712 encoding (#11092)
* [client]: Fix for incorrectly dropped consensus messages (#11082) (#11086)
* Update hardcoded headers (foundation, classic, kovan, xdai, ewc, ...) (#11053)
* Add cargo-remote dir to .gitignore (?)
* Update light client headers: ropsten 6631425 foundation 8798209 (#11201)
* Update list of bootnodes for xDai chain (#11236)
* ethcore/res: add mordor testnet configuration (#11200)
* [chain specs]: activate Istanbul on mainnet (#11228)
* [builtin]: support multiple prices and activations in chain spec (#11039)
* [receipt]: add sender & receiver to RichReceipts (#11179)
* [ethcore/builtin]: do not panic in blake2pricer on short input (#11180)
* Made ecrecover implementation trait public (#11188)
* Fix docker centos build (#11226)
* Update MIX bootnodes. (#11203)
* Insert explicit warning into the panic hook (#11225)
* Use provided usd-per-eth value if an endpoint is specified (#11209)
* Cleanup stratum a bit (#11161)
* Add Constantinople EIPs to the dev (instant_seal) config (#10809) (already backported)
* util Host: fix a double Read Lock bug in fn Host::session_readable() (#11175)
* ethcore client: fix a double Read Lock bug in fn Client::logs() (#11172)
* Type annotation for next_key() matching of json filter options (#11192)
* Upgrade jsonrpc to latest (#11206)
* [dependencies]: jsonrpc 14.0.1 (#11183)
* Upgrade to jsonrpc v14 (#11151)
* Switching sccache from local to Redis (#10971)
* Snapshot restoration overhaul (#11219)
* Add new line after writing block to hex file. (#10984)
* Pause pruning while snapshotting (#11178)
* Change how RPCs eth_call and eth_estimateGas handle "Pending" (#11127)
* Fix block detail updating (#11015)
* Make InstantSeal Instant again #11186
* Filter out some bad ropsten warp snapshots (#11247)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust. Q2-easy 💃 Can be fixed by copy and pasting from StackOverflow.
Projects
None yet
Development

No branches or pull requests

3 participants