Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

wasmtime: fix the memory zeroing for imported memories #5036

Closed
pepyakin opened this issue Feb 24, 2020 · 3 comments
Closed

wasmtime: fix the memory zeroing for imported memories #5036

pepyakin opened this issue Feb 24, 2020 · 3 comments
Milestone

Comments

@pepyakin
Copy link
Contributor

At the moment, if the memory instance is imported it is not cleared.

See #4903 for the test that demonstrates this.

@pepyakin pepyakin added this to the 2.1 milestone Feb 24, 2020
@gnunicorn gnunicorn modified the milestones: 2.1, Polkadot Mar 4, 2020
@pepyakin
Copy link
Contributor Author

A few clarifications.

So far, when a call into a wasm runtime happens, it is given a new fresh instance initialized to the initial state. The initial state is defined as following: the memory is zeroed and then data segments are copied into memory from the wasm binary.

Also, note that there are essentially two variants of handling memory: runtime wasm instance exports memory and runtime wasm instance imports memory.

In wasmi we do quick erasing by unmapping and remapping the virtual pages and then copying in the data segments.

In wasmtime it happens as follows. The wasm instance is recreated each time for before a call. However, if the runtime wasm instance imports memory, the memory instance is cached in the Imports structure and preserved between calls into runtime. The data segments are copied in when the instance is instance is initialized though, so only the heap part retains all the data.

A naive solution would be just to go over all this memory and zero it. However, this would be really slow. Another potential solution that might work out here, is to try to recreate the memory instance (ensuring that all host functions have the updated memory reference). This might work ok if it will end up munmap and mmap akin to what wasmi is doing.
The best solution would be to control the linear memory ourselves, but ATM wasmtime doesn't support appropriate APIs.

Now regarding the impact: this is not a show-stopper. It might be OK if the memory is not completely reset. While this is a potential source of non-determinism, it cannot be observed in well-behaved programs. This is because reading uninitialized memory is treated as UB in Rust/C/C++. OTOH, even if we get the tools to handle the memory ourselves it will be still be less performant than doing nothing.

So essentially there is a tradeoff: should we trade some determinism in some cases for the sake of more performance?

@pepyakin
Copy link
Contributor Author

pepyakin commented Apr 8, 2020

There is one problem with this: .bss or data initialized to zero.

When rustc compiles the following code to wasm32-unknown-unknown:

static mut A: *const () = ptr::null();
static mut B: usize = 5;

, wasm-ld would produce (at least at the moment of writing with current nightlies) the following data segments

(data (i32.const 4) "\00\00\00\00") ;; A
(data (i32.const 8) "\05\00\00\00") ;; B

Data segments are byte buffers that are copied to the linear memory when the module is instantantied at the specified offset, e.g. i32.const 4.

When a linear memory is created, it is guaranteed to be zeroed. However, a wasm instance that imports memory can not assume that in the general case , because there is no guarantee that a memory instance that was provided at instantiation time is zeroed. In fact, this is exactly what can happen if we reuse a memory instance and instantiate the same module with a memory tainted by a previous instantiation.

However, encoding data zero segments explicitly regresses the common scenario for WebAssembly: when an instance usually instantiated into prestine zeroed memory. A discussion around this topic happened in this issue some time ago (late 2017/early 2018) and the consensus was to not supply these zeroes, and the current behavior - providing explicit zeroes - deemed to be a bug. FWIW, the behavior of rustc/LLVM is still present to this day.

I think a good way to proceed is to implement clearing of the .bss area explicitly. To detect the .bss area we can take the end of the last data segment and zero up to __data_end or __heap_base.

@koute
Copy link
Contributor

koute commented Apr 15, 2022

Since for wasmtime we always clear the memory now (and we patch the WASM binary to never import its memory anyway) this can be closed.

@koute koute closed this as completed Apr 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants