Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Fixed AuthorityRound deadlock on shutdown #8803

Merged
merged 1 commit into from
Jun 8, 2018
Merged

Fixed AuthorityRound deadlock on shutdown #8803

merged 1 commit into from
Jun 8, 2018

Conversation

debris
Copy link
Collaborator

@debris debris commented Jun 5, 2018

closes #8088

follow up for @svyatonik explanation

memory in our event loop looks as follows:

IoService -> manager_thread -> IoManager -> Workers -> workers_threads

and in case of AuthorityRound it was additionally:

workers_threads -> TransitionHandler -> Weak<AuthorityRound> -> IoService

everything was fine, as long as we were not trying to shutdown one of the workers when he owned the strong reference of IoService

as a solution, I refactored the code so TransitionHandler thread never has ownership of IoService

@debris debris added A0-pleasereview 🤓 Pull request needs code review. B0-patchthis M4-core ⛓ Core client code / Rust. labels Jun 5, 2018
@debris debris requested a review from svyatonik June 5, 2018 15:44
Copy link
Collaborator

@svyatonik svyatonik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could confirm that the issue is fixed. Code changes also LGTM (no more engine.upgrade() calls from TransitionHandler).

@5chdn 5chdn mentioned this pull request Jun 5, 2018
57 tasks
@5chdn 5chdn added this to the 1.12 milestone Jun 7, 2018
Copy link
Collaborator

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can not reproduce #8088 on Kovan with this changes.

// Make sure to advance up to the actual step.
while self.step.inner.duration_remaining().as_millis() == 0 {
self.step.inner.increment();
self.step.can_propose.store(true, AtomicOrdering::SeqCst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth extracting into a function:

fn step(step: &PermissionedStep, client: &RwLock<Option<Weak<EngineClient>>>) {
	step.inner.increment();
	step.can_propose.store(true, AtomicOrdering::SeqCst);
	if let Some(ref weak) = *client.read() {
		if let Some(c) = weak.upgrade() {
			c.update_sealing();
		}
	}
}

to avoid code duplication?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code would also leak, cause it upgrades the reference to EngineClient from worker thread

@rphmeier
Copy link
Contributor

rphmeier commented Jun 8, 2018

Engine's step function might be completely unnecessary now. it's currently used for tests but we can use TransitionHandler in the same way. Not a blocker for this PR but we could file to refactor.

@5chdn 5chdn merged commit 1f39a1b into master Jun 8, 2018
@5chdn 5chdn deleted the ar_deadlock branch June 8, 2018 14:30
dvdplm added a commit that referenced this pull request Jun 11, 2018
* master:
  Fix subcrate test compile (#8862)
  network-devp2p: downgrade logging to debug, add target (#8784)
  Clearing up a comment about the prefix for signing (#8828)
  Disable parallel verification and skip verifiying already imported txs. (#8834)
  devp2p: Move UDP socket handling from Discovery to Host. (#8790)
  Fixed AuthorityRound deadlock on shutdown, closes #8088 (#8803)
  Specify critical release flag per network (#8821)
  Fix `deadlock_detection` feature branch compilation (#8824)
  Use system allocator when profiling memory (#8831)
  added from and to to Receipt (#8756)
@5chdn 5chdn added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jun 18, 2018
@andresilva andresilva mentioned this pull request Jun 18, 2018
18 tasks
@ascjones ascjones mentioned this pull request Jun 18, 2018
9 tasks
5chdn pushed a commit that referenced this pull request Jun 19, 2018
* Fixed AuthorityRound deadlock on shutdown, closes #8088 (#8803)

* CI: Fix docker tags (#8822)

* scripts: enable docker builds for beta and stable

* scripts: docker latest should be beta not master

* scripts: docker latest is master

* Fix concurrent access to signer queue (#8854)

* Fix concurrent access to signer queue

* Put request back to the queue if confirmation failed

* typo: fix docs and rename functions to be more specific

`request_notify` does not need to be public, and it's renamed to `notify_result`.
`notify` is renamed to `notify_message`.

* Change trace info "Transaction" -> "Request"

* Add new ovh bootnodes and fix port for foundation bootnode 3.2 (#8886)

* Add new ovh bootnodes and fix port for foundation bootnode 3.2

* Remove old bootnodes.

* Remove duplicate 1118980bf48b0a3640bdba04e0fe78b1add18e1cd99bf22d53daac1fd9972ad650df52176e7c7d89d1114cfef2bc23a2959aa54998a46afcf7d91809f0855082

* Block 0 is valid in queries (#8891)

Early exit for block nr 0 leads to spurious error about pruning: `…your node is running with state pruning…`.

Fixes #7547, #8762

* update jsonrpc libs, fixed ipc leak, closes #8774 (#8876)

Instead of cherrypicking 8b78141, just ran cargo update -p jsonrpc-core

* Add ETC Cooperative-run load balanced parity node (#8892)

* Minor fix in chain supplier and light provider (#8906)

* fix chain supplier increment

* fix light provider block_headers
5chdn pushed a commit that referenced this pull request Jun 19, 2018
* `duration_ns: u64 -> duration: Duration` (#8457)

* duration_ns: u64 -> duration: Duration

* format on millis {:.2} -> {}

* Keep all enacted blocks notify in order (#8524)

* Keep all enacted blocks notify in order

* Collect is unnecessary

* Update ChainNotify to use ChainRouteType

* Fix all ethcore fn defs

* Wrap the type within ChainRoute

* Fix private-tx and sync api

* Fix secret_store API

* Fix updater API

* Fix rpc api

* Fix informant api

* Eagerly cache enacted/retracted and remove contain_enacted/retracted

* Fix indent

* tests: should use full expr form for struct constructor

* Use into_enacted_retracted to further avoid copy

* typo: not a function

* rpc/tests: ChainRoute -> ChainRoute::new

* Handle removed logs in filter changes and add geth compatibility field (#8796)

* Add removed geth compatibility field in log

* Fix mocked tests

* Add field block hash in PollFilter

* Store last block hash info for log filters

* Implement canon route

* Use canon logs for fetching reorg logs

Light client removed logs fetching is disabled. It looks expensive.

* Make sure removed flag is set

* Address grumbles

* Fixed AuthorityRound deadlock on shutdown, closes #8088 (#8803)

* CI: Fix docker tags (#8822)

* scripts: enable docker builds for beta and stable

* scripts: docker latest should be beta not master

* scripts: docker latest is master

* ethcore: fix ancient block error msg handling (#8832)

* Disable parallel verification and skip verifiying already imported txs. (#8834)

* Reject transactions that are already in pool without verifying them.

* Avoid verifying already imported transactions.

* Fix concurrent access to signer queue (#8854)

* Fix concurrent access to signer queue

* Put request back to the queue if confirmation failed

* typo: fix docs and rename functions to be more specific

`request_notify` does not need to be public, and it's renamed to `notify_result`.
`notify` is renamed to `notify_message`.

* Change trace info "Transaction" -> "Request"

* Don't allocate in expect_valid_rlp unless necessary (#8867)

* don't allocate via format! in case there's no error

* fix test?

* fixed ipc leak, closes #8774 (#8876)

* Add new ovh bootnodes and fix port for foundation bootnode 3.2 (#8886)

* Add new ovh bootnodes and fix port for foundation bootnode 3.2

* Remove old bootnodes.

* Remove duplicate 1118980bf48b0a3640bdba04e0fe78b1add18e1cd99bf22d53daac1fd9972ad650df52176e7c7d89d1114cfef2bc23a2959aa54998a46afcf7d91809f0855082

* Block 0 is valid in queries (#8891)

Early exit for block nr 0 leads to spurious error about pruning: `…your node is running with state pruning…`.

Fixes #7547, #8762

* Add ETC Cooperative-run load balanced parity node (#8892)

* Minor fix in chain supplier and light provider (#8906)

* fix chain supplier increment

* fix light provider block_headers

* Check whether we need resealing in miner and unwrap has_account in account_provider (#8853)

* Remove unused Result wrap in has_account

* Check whether we need to reseal for external transactions

* Fix reference to has_account interface

* typo: missing )

* Refactor duplicates to prepare_and_update_sealing

* Fix build

* Allow disabling local-by-default for transactions with new config entry (#8882)

* Add tx_queue_allow_unknown_local config option

- Previous commit messages:

dispatcher checks if we have the sender account

Add `tx_queue_allow_unknown_local` to MinerOptions

Add `tx_queue_allow_unknown_local` to config

fix order in MinerOptions to match Configuration

add cli flag for tx_queue_allow_unknown_local

Update refs to `tx_queue_allow_unknown_local`

Add tx_queue_allow_unknown_local to config test

revert changes to dispatcher

Move tx_queue_allow_unknown_local to `import_own_transaction`

Fix var name

if statement should return the values

derp de derp derp derp semicolons

Reset dispatch file to how it was before

fix compile issues + change from FLAG to ARG

add test and use `into`

import MinerOptions, clone the secret

Fix tests?

Compiler/linter issues fixed

Fix linter msg - case of constants

IT LIVES

refactor to omit yucky explict return

update comments

Fix based on diff AccountProvider.has_account method

* Refactor flag name + don't change import_own_tx behaviour

fix arg name

Note: force commit to try and get gitlab tests working again 😠

* Add fn to TestMinerService

* Avoid race condition from trusted sources

- refactor the miner tests a bit to cut down on code reuse
- add `trusted` param to dispatch_transaction and import_claimed_local_transaction

Add param to `import_claimed_local_transaction`

Fix fn sig in tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic (deadlock) on Parity shutdown when using AuthorityRound engine
5 participants