Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Fix _cannot recursively call into Core_ issue #10144

Merged
merged 2 commits into from
Jan 11, 2019
Merged

Conversation

ngotchac
Copy link
Contributor

@ngotchac ngotchac commented Jan 7, 2019

It seems that on latest master, Parity would sometimes crash with
Thread 'IO Worker #2' panicked at 'cannot recursively call into "Core"', libcore/option.rs:1008.

This is because we use https://github.com/sbstp/rust-igd method search_gateway_from_timeout which calls get_control_url which runs a future in a new tokio Core, while we call this method already from a running tokio Core.

https://github.com/sbstp/rust-igd/blob/df53f419eec3684eec4bf0f45003cff1c4933682/src/search.rs#L101-L105

Moving to this fork and branch https://github.com/maufl/rust-igd/tree/update-hyper solves the issue ; maybe we should have our own fork?

Instead of using another fork, I found that the easiest way to fix this is actually to run igd::search_gateway_from_timeout in a separate thread, where we are sure tokio won't be already running.

To test that this work, one can apply this diff patch:

diff --git a/util/io/src/worker.rs b/util/io/src/worker.rs
index 458882a85..cab55f8b5 100644
--- a/util/io/src/worker.rs
+++ b/util/io/src/worker.rs
@@ -72,7 +72,9 @@ impl Worker {
 			move || {
 				LOCAL_STACK_SIZE.with(|val| val.set(STACK_SIZE));
 				let ini = (stealer, channel.clone(), wait, wait_mutex.clone(), deleting);
+				let first_call = ::std::sync::atomic::AtomicBool::new(true);
 				let future = future::loop_fn(ini, |(stealer, channel, wait, wait_mutex, deleting)| {
+					if !first_call.swap(false, AtomicOrdering::Relaxed)
 					{
 						let mut lock = wait_mutex.lock();
 						if deleting.load(AtomicOrdering::Acquire) {

The inherant issue is that future::loop_fn first calls the closure, so if the message to run init_public_interface (which calls igd::search_gateway_from_timeout) is already in the queue when setting up the workers, it would run just fine since tokio::runtime::current_thread::block_on_all hasn't been called yet ; this happens in most cases since it's the first message sent to the IoService. It could happen that it isn't, resulting in init_public_interface being called in a Tokio context, crashing parity-ethereum.

With this patch, the first call to the future's closure won't wait for a message, resulting in every message being called in a Tokio context. So if one applies this patch on master, it should crash the app at every single run.

@ngotchac ngotchac added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Jan 7, 2019
@5chdn 5chdn added this to the 2.3 milestone Jan 7, 2019
@ngotchac ngotchac added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jan 7, 2019
@5chdn 5chdn modified the milestones: 2.3, 2.4 Jan 10, 2019
This was referenced Jan 10, 2019
@ngotchac ngotchac added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Jan 10, 2019
@ngotchac
Copy link
Contributor Author

The bug first appeared after #9979, so I'm not sure to which versions it applies. (@5chdn)

Err(_) => return None,
Ok(gateway_result) => gateway_result,
};
match gateway_result {
Err(ref err) => debug!("Gateway search error: {}", err),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

match search_gateway_child.join().ok()? {
   Err(e) => {}
   Ok(g) => {}
}

@niklasad1
Copy link
Collaborator

@ngotchac I have seen this when I enable a lot of logs but I don't understand how the logs apply to this? Or do you think it is something else?

@ngotchac
Copy link
Contributor Author

@niklasad1 I think it's just a race-condition, so enabling logs might just slow down some parts of the start-up and raise this issue.

Copy link
Collaborator

@niklasad1 niklasad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

Can you remove the first commit which is confusing IMHO because it was removed in the in the second commit without any explanation?

@niklasad1 niklasad1 added A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jan 11, 2019
Copy link
Collaborator

@sorpaas sorpaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We only call it at startup so starting a thread shouldn't be bad.

@sorpaas sorpaas merged commit 3687df8 into master Jan 11, 2019
@sorpaas sorpaas deleted the ng-fix-tokio-recursion branch January 11, 2019 16:59
5chdn pushed a commit that referenced this pull request Jan 12, 2019
* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread
5chdn pushed a commit that referenced this pull request Jan 12, 2019
* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread
5chdn added a commit that referenced this pull request Jan 15, 2019
* version: bump stable to 2.2.7

* version: mark 2.2 track stable

* version: mark update critical on all networks

* version: commit cargo lock

* Ping nodes from discovery (#10167)

* snap: fix path in script (#10157)

* snap: fix path in script

* debug, revert me

* fix

* necromancer awk magic

* awk necromancy and path fixing

* working track selection

* Fix _cannot recursively call into `Core`_ issue (#10144)

* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread

* Handle the case for contract creation on an empty but exist account with storage items (#10065)

* Add is_base_storage_root_unchanged

* Fix compile, use a shortcut for check, and remove ignored tests

* Add a warn!

* Update ethereum/tests to v6.0.0-beta.2

* grumble: use {:#x} instead of 0x{:x}

Co-Authored-By: sorpaas <accounts@that.world>

* version: bump fork blocks for kovan and foundation (#10186)

* pull constantinople on ethereum network (#10189)

* ethcore: pull constantinople on ethereum network

* version: mark update as critical

* ethcore: remove constantinople alltogether from chain spec

* version: revert fork block for ethereum
5chdn added a commit that referenced this pull request Jan 15, 2019
* version: mark 2.3 track beta

* version: mark update critical on all networks

* Ping nodes from discovery (#10167)

* Fix _cannot recursively call into `Core`_ issue (#10144)

* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread

* Handle the case for contract creation on an empty but exist account with storage items (#10065)

* Add is_base_storage_root_unchanged

* Fix compile, use a shortcut for check, and remove ignored tests

* Add a warn!

* Update ethereum/tests to v6.0.0-beta.2

* grumble: use {:#x} instead of 0x{:x}

Co-Authored-By: sorpaas <accounts@that.world>

* version: bump fork blocks for kovan and foundation (#10186)

* pull constantinople on ethereum network (#10189)

* ethcore: pull constantinople on ethereum network

* version: mark update as critical

* ethcore: remove constantinople alltogether from chain spec

* version: revert fork block for ethereum
@5chdn 5chdn mentioned this pull request Jan 21, 2019
6 tasks
5chdn pushed a commit that referenced this pull request Jan 22, 2019
* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread
5chdn added a commit that referenced this pull request Jan 22, 2019
* version: bump beta to 2.3.1

* Fix _cannot recursively call into `Core`_ issue (#10144)

* Change igd to github:maufl/rust-igd

* Run `igd::search_gateway_from_timeout` from own thread

* Update for Android cross-compilation. (#10180)

* build-unix update

* .gitlab-ci update

* Update build-unix.sh

add android postprocessing

* path to android lib

libparity.so

* fix path to libparity

* add android lib to artifacts

* Run all `igd` methods in its own thread (#10195)

* Cancel Constantinople HF on POA Core (#10198)

* Add EIP-1283 disable transition (#10214)

* Enable St-Peters-Fork ("Constantinople Fix") (#10223)

* ethcore: disable eip-1283 on kovan block 10255201

* ethcore: disable eip-1283 on ropsten block 4939394

* ethcore: enable st-peters-fork on mainnet block 7280000

* ethcore: fix kovan chain spec

* version: update fork blocks

* ethcore: disable eip-1283 on sokol block 7026400
ordian added a commit that referenced this pull request Mar 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A6-mustntgrumble 💦 Pull request has areas for improvement. The author need not address them before merging. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants