Fix WASM executor without instance reuse; cleanups and refactoring #10313

koute · 2021-11-19T06:45:37Z

The current codepath for the WASM executor without instance reuse is broken. We're constantly reusing the same Store, which according to the wasmtime docs we're not supposed to do:

A Store is intended to be a short-lived object in a program. No form of GC is implemented at this time so once an instance is created within a Store it will not be deallocated until the Store itself is dropped. This makes Store unsuitable for creating an unbounded number of instances in it because Store will never release this memory.

This results in a memory leak, and after 10k calls into the runtime the executor will get permabricked and will panic on each call since it won't be able to spawn any new instances anymore. (Fortunately this is not a problem in practice since the codepath which reuses the instances is the default, and that one is unaffected.)

So this PR fixes the problem by bundling the Store along with each instance of the executor.

Also, since this code was originally written wasmtime grew a bunch of new APIs which we can use to clean up the code a little, so that's what I did. Multiple Rcs and RefCells were removed which makes it easier to reason about the lifetimes now. We also had a blanket unsafe impl Send for WasmtimeInstance {} which was also removed. Unfortunately we still need an unsafe impl Send for the sandbox store, but it's still an improvement since the impl now touches strictly less code. (It'd be nice to get rid of that one too, but unfortunately that would necessitate even bigger changes.)

crystalin · 2021-11-19T13:46:43Z

@koute should we expect some performance impact changing this ?

koute · 2021-11-19T13:51:46Z

@koute should we expect some performance impact changing this ?

Nope. It might change the performance of the executor without instance reuse, however:

as I've explained that codepath is currently broken,
and it's currently unused anyway as far as I know.

pepyakin

I welcome the clean up

pepyakin · 2021-11-19T15:54:04Z

client/executor/wasmtime/src/instance_wrapper.rs

+
+		// Scan all imports, find the matching host functions, and create stubs that adapt arguments
+		// and results.
+		let imports = crate::imports::resolve_imports(


Nit:

do you think it would be better to push imports to be a submodule of instance_wrapper?

Hmmm.... to be honest I don't think it's worth it in this case.

We could either make it a proper submodule (as in - dedicated directory with a dedicated file in it), but it seems silly for only a single 300 line file, or we could just inline it into instance_wrapper.rs, but it's kinda nice that the imports is in its own dedicated file which only does one thing and exports only one thing (the Imports struct + a function to create it).

client/executor/wasmtime/src/instance_wrapper.rs

pepyakin · 2021-11-19T16:27:12Z

client/executor/wasmtime/src/host.rs

+		self.host_state_mut().sandbox_store.0 = Some(store);
+
+		let instance_idx_or_err_code =
+			match result.expect("instantiating the sandbox does not panic") {


The thing is, the sandbox module or runtimes can actually panic!

When state-machine runs, it needs to have access to storage backend. This storage backend can be trusted, i.e. the on-disk storage. We assume it never panics and if it does, then that's an unrecoverable error.

However, when running a light-client, a backend could be based on a witness - basically data, and a merkle proof that that data actually comes from where it is claimed to come from. The caveat is that we do not eagerly check this proof, as far as I understand, and there are situations when the runtime may request a storage entry that actually does not present in the witness. In such case, a panic is emitted. You can get more details and my complaints here under the heading "panics". I assumed we could remove this code by now I am not sure → we will need at least some way to communicating such conditions through the sandbox layer (cc @athei) (assuming we will continue to support sandbox API)

To tie back it to this specific case, I assume the following can happen:

on a light client we execute some call with a proof. The proof is corrupted and the backend will panic if X is accessed.

the runtime is spawned.

the runtime instantiates a new sandbox with some wasm code.

In the start function calls back into the runtime and makes the runtime fetch X from storage and thus panics.

it unwinds here and here we panic due to unwrap losing the context of the original panic.

Soi I am wondering, if were better to resume_unwind or not catching at all?

cc @bkchr

That's good to know! Okay, so since this is technically outside of the scope of this PR I'll just do a resume_unwind to keep the original behavior. (I don't know, this might not be be necessary and maybe we could just ignore the panic altogether, however I think it's just simpler to catch it and restore the sandbox store anyway even if might not be necessary now. One less thing you have to think about whether it'll break something or not.)

koute · 2021-11-22T08:21:55Z

One more minor cleanup - I've moved the utility functions for reading/writing memory from instance_wrapper.rs to util.rs. (Those were previously instance methods of InstanceWrapper, but now they're freestanding functions, and they're not used in instance_wrapper.rs itself, so it's kinda awkward to leave them there.)

client/executor/wasmtime/src/tests.rs

client/executor/wasmtime/src/runtime.rs

client/executor/wasmtime/src/host.rs

koute · 2021-11-23T06:35:11Z

bot merge

…aritytech#10313) * Fix WASM executor without instance reuse; cleanups and refactoring * Align to review comments * Move the functions for reading/writing memory to `util.rs` * Only `#[ignore]` the test in debug builds * More review comments and minor extra comments

Fix WASM executor without instance reuse; cleanups and refactoring

fa01782

koute requested review from gilescope and pepyakin November 19, 2021 06:45

pepyakin approved these changes Nov 19, 2021

View reviewed changes

koute added 2 commits November 22, 2021 17:18

Align to review comments

c1b974b

Move the functions for reading/writing memory to util.rs

99607a1

koute requested a review from bkchr November 22, 2021 08:22

bkchr approved these changes Nov 22, 2021

View reviewed changes

koute added 2 commits November 22, 2021 20:40

Only #[ignore] the test in debug builds

22dd192

More review comments and minor extra comments

e63c7a1

paritytech-processbot bot merged commit 4de8ee2 into paritytech:master Nov 23, 2021

github-actions bot mentioned this pull request Dec 2, 2021

Update substrate/polkadot/cumulus from v0.9.12 to v0.9.13 moonbeam-foundation/moonbeam#1050

Closed

koute mentioned this pull request Sep 29, 2022

Member Request polkadot-fellows/seeding#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WASM executor without instance reuse; cleanups and refactoring #10313

Fix WASM executor without instance reuse; cleanups and refactoring #10313

koute commented Nov 19, 2021 •

edited

Loading

crystalin commented Nov 19, 2021

koute commented Nov 19, 2021

pepyakin left a comment

pepyakin Nov 19, 2021

koute Nov 22, 2021

pepyakin Nov 19, 2021

This comment was marked as off-topic.

This comment was marked as off-topic.

koute Nov 22, 2021

koute commented Nov 22, 2021

koute commented Nov 23, 2021

Fix WASM executor without instance reuse; cleanups and refactoring #10313

Fix WASM executor without instance reuse; cleanups and refactoring #10313

Conversation

koute commented Nov 19, 2021 • edited Loading

crystalin commented Nov 19, 2021

koute commented Nov 19, 2021

pepyakin left a comment

Choose a reason for hiding this comment

pepyakin Nov 19, 2021

Choose a reason for hiding this comment

koute Nov 22, 2021

Choose a reason for hiding this comment

pepyakin Nov 19, 2021

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

koute Nov 22, 2021

Choose a reason for hiding this comment

koute commented Nov 22, 2021

koute commented Nov 23, 2021

koute commented Nov 19, 2021 •

edited

Loading