Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to wasmi v0.32 and add lazy validation #1488

Merged
merged 10 commits into from
Dec 23, 2023
Merged

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Dec 18, 2023

cc #864

This PR shouldn't be merged as-is. The compilation mode should not always be Lazy. Instead, we should add a new ExecHint.

However I've done this change in order to measure how much faster it is.
To do so, I've measured three times how long it takes to compile the Westend runtime.

"Compile the Westend runtime" includes decoding the zstd-compressed runtime, parsing some custom Wasm sections, creating a Module (with wasmi), and creating an instance (with wasmi) but not executing it. So the times below are not just wasmi, but should still mainly be wasmi.

With wasmi v0.31:

264.161ms
195.032ms
210.432ms

With wasmi master branch (commit 86d097623592dc499852ea4bb01cace0097d70a4):

702.528ms
661.07ms
684.553ms

With wasmi master branch (commit 86d097623592dc499852ea4bb01cace0097d70a4) with compilation mode set to Lazy (this PR)

83.617ms
84.843ms
85.62ms

cc @Robbepop

@Robbepop
Copy link

Robbepop commented Dec 18, 2023

It is a bit weird that Wasmi master branch is roughly 3 times slower than Wasmi v0.31.0 because in benchmark tests conducted so far the slowdown compared to v0.31.0 was roughly 30%, not 200%+.

Good though that lazy compilation actually improves the situation.

@Robbepop
Copy link

Robbepop commented Dec 18, 2023

@tomaka FYI: I just uploaded v0.32.0-beta.1.

Any explanation on your side for the severe slowdowns? Proper optimizations and optimized Wasm used?
As an indication for Spidermonkey.wasm (4.2MB after wasm-opt) we measured that lazy compilation takes Wasmi roughly 3ms and eager compilation roughly 45ms. Unless the Westend runtime is orders of magnitudes larger than Spidermonkey.wasm I cannot really see how those 80ms for lazy compilation can be real.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 18, 2023

What I measured is wasmi that has itself been compiled to .wasm and being executed by NodeJS.
Plus there is some overhead (the same no matter the version of wasmi) because the runtime is compressed and needs to be decompressed.
It's normal that it's slow and you shouldn't look at the number in absolute.

@Robbepop
Copy link

Robbepop commented Dec 18, 2023

What I measured is wasmi that has itself been compiled to .wasm and being executed by NodeJS. Plus there is some overhead (the same no matter the version of wasmi) because the runtime is compressed and needs to be decompressed. It's normal that it's slow and you shouldn't look at the number in absolute.

I see that Wasmi v0.31.0 takes roughly 200ms for the entire process and since you said that this mostly measures Wasmi I expected Wasmi v0.32.0-beta.N to take roughly 30% more, e.g. roughly 260ms. But in your tests it is WAY more, so i wonder why. Given 260ms expected eager compilation time, I'd expect lazy compilation to take roughly 15-25ms and not 80ms. Therefore I am confused.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 18, 2023

Proper optimizations and optimized Wasm used?

Ok I wasn't actually optimizing the build (I'm a bit confused because the compilation options go through a couple of layers).
Running the same test later in an hour or two.

@Robbepop
Copy link

Robbepop commented Dec 18, 2023

Proper optimizations and optimized Wasm used?

Ok I wasn't actually optimizing the build (I'm a bit confused because the compilation options go through a couple of layers). Running the same test later in an hour or two.

No problem! Looking forward to the new numbers. :)

@tomaka
Copy link
Contributor Author

tomaka commented Dec 18, 2023

Okay here we go. I didn't actually wait an hour:

wasmi 0.31:

293.095ms
245.785ms
266.289ms

wasmi 0.32:

334.245ms
334.879ms
341.965ms

wasmi 0.32 with lazy compilation:

63.022ms
65.548ms
74.998ms

I've also uploaded the runtime (decompressed) in hexadecimal (because of GitHub uploads limitations), if you want to try it yourself:

runtime.wasm.txt

@Robbepop
Copy link

Robbepop commented Dec 18, 2023

Thanks for the quick update!

The new numbers (especially eager) look way more realistic now.
The 5x speedup with lazy compilation is a bit low but I think this might be due to the other things that are still happening which you mentioned in the first post. Especially parsing custom sections is probably pretty slow. With lazy compilation & validation we completely avoid parsing Wasm.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 18, 2023

Especially parsing custom sections is probably pretty slow.

This is off-topic, but I don't think so. Each section in a wasm file starts with its length, so we can easily iterate between the (not so many) sections until we find the ones we want. And then the content of the sections (what I'm parsing) are somewhere around 100 to 200 bytes.

@Robbepop
Copy link

Especially parsing custom sections is probably pretty slow.

This is off-topic, but I don't think so. Each section in a wasm file starts with its length, so we can easily iterate between the (not so many) sections until we find the ones we want. And then the content of the sections (what I'm parsing) are somewhere around 100 to 200 bytes.

Ah okay, 100-200 bytes indeed shouldn't make a huge difference.

@tomaka tomaka changed the title Try updating to wasmi v0.32 Update to wasmi v0.32 and add lazy validation Dec 19, 2023
@tomaka tomaka marked this pull request as ready for review December 19, 2023 08:30
@tomaka
Copy link
Contributor Author

tomaka commented Dec 19, 2023

@Robbepop Interestingly, one specific test seems to fail with InvalidWasm("branching offset is out of bounds for wasmi bytecode"). That Wasm was successfully accepted by wasmi 0.31.

I have many other tests in the repo that compile and run Wasm blobs, and these pass successfully. That specific failing test uses an old Polkadot runtime, so it's not impossible that the runtime was miscompiled or something.

You can find the failing Wasm here: https://github.com/smol-dot/smoldot/blob/main/lib/src/executor/vm/test-polkadot-runtime-v9160.wasm

@Robbepop
Copy link

Robbepop commented Dec 19, 2023

@Robbepop Interestingly, one specific test seems to fail with InvalidWasm("branching offset is out of bounds for wasmi bytecode"). That Wasm was successfully accepted by wasmi 0.31.

I have many other tests in the repo that compile and run Wasm blobs, and these pass successfully. That specific failing test uses an old Polkadot runtime, so it's not impossible that the runtime was miscompiled or something.

You can find the failing Wasm here: https://github.com/smol-dot/smoldot/blob/main/lib/src/executor/vm/test-polkadot-runtime-v9160.wasm

Ah that's very interesting. I know what is happening there. Due to some optimizations the new Wasmi uses 16-bit encoded branching offsets whereas old Wasmi used 32-bit branching offsets. I would have never assumed that this was going to be problematic because it means that there are jumps over more than 32k Wasmi instructions. Note that a single Wasmi can make up to 4 Wasm instructions. Probably a result of very aggressive inlining. This is a current limitation of the new Wasmi bytecode, however, if necessary I know how to deal with it.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 19, 2023

if necessary

Since a valid wasm bytecode fails to compile, it's kind of necessary, no?

@Robbepop
Copy link

Every Wasm runtime has certain limitations and thus every wasm runtime may encounter valid Wasm files that fail to compile. So it is more of a question where to draw the line.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 19, 2023

That's very surprising to me, especially after the discussion about lazy validation. I understand that there are issues such as the maximum size of the stack, but I was convinced that validating WebAssembly would deterministically either pass or fail. It's like the WebAssembly committee/people care about problematic situations that are easy to solve but not the ones that are hard to solve.

The Wasm runtime used in the failing test is one that was used on the Polkadot chain. It's not, say, a weird experiment. It's legitimate Rust code that was compiled with the Rust compiler. While the newer Polkadot runtime seems to work, the chances that a future runtime no longer compiles with wasmi seem relatively high to me.

@Robbepop
Copy link

Robbepop commented Dec 19, 2023

That's very surprising to me, especially after the discussion about lazy validation. I understand that there are issues such as the maximum size of the stack, but I was convinced that validating WebAssembly would deterministically either pass or fail. It's like the WebAssembly committee/people care about problematic situations that are easy to solve but not the ones that are hard to solve.

The Wasm runtime used in the failing test is one that was used on the Polkadot chain. It's not, say, a weird experiment. It's legitimate Rust code that was compiled with the Rust compiler. While the newer Polkadot runtime seems to work, the chances that a future runtime no longer compiles with wasmi seem relatively high to me.

It is all about runtime limitations. For practical reasons those limitations should be chosen to allow for all practical use cases. Wasmtime for example has a limitation on function parameters, so even though a Wasm function with 2k parameters is valid it would fail to compile on Wasmtime. However, practical Wasm blobs dont make use of so many parameters anyways.

I was under the impression that 16-bit branch offsets should be enough for all practical use case but it seems I was wrong so I agree that this needs to be fixed.

@Robbepop
Copy link

@tomaka
Copy link
Contributor Author

tomaka commented Dec 21, 2023

All tests are passing, apart from the ones that fail because git dependencies are forbidden.

@Robbepop
Copy link

All tests are passing, apart from the ones that fail because git dependencies are forbidden.

That's amazing! Thanks a lot for the testing and updates.

@Robbepop
Copy link

@tomaka Just release v0.32.0-beta.2.

@tomaka tomaka added this pull request to the merge queue Dec 23, 2023
Merged via the queue into smol-dot:main with commit 3ff5453 Dec 23, 2023
23 checks passed
@Robbepop
Copy link

Robbepop commented Dec 23, 2023

@tomaka I see that you just merged the PR to use the new beta Wasmi. Is that a good idea? It isn't officially production ready, yet. If this is not a big deal for Smoldot then it is obviously great for the new Wasmi version to get battle tested.

@tomaka
Copy link
Contributor Author

tomaka commented Dec 23, 2023

All the runtimes that I've tried work, and given that changes in the runtime are usually pretty small, the chances that it breaks in the future are not very high.
It's not like I'm feeding untrusted wasm bytecode coming from the networking into wasmi for example, in which case it would be more "dangerous".
The worst case scenario if something is broken is that UIs don't work.

@tomaka tomaka deleted the wasmi-0.32 branch December 23, 2023 11:31
@Robbepop
Copy link

@tomaka Okay great, thanks for the elaboration. I am happy to see that Smoldot is a perfect testing ground for the new Wasmi version. :)

tomaka added a commit to tomaka/smoldot that referenced this pull request Dec 28, 2023
tomaka added a commit to tomaka/smoldot that referenced this pull request Dec 28, 2023
github-merge-queue bot pushed a commit that referenced this pull request Dec 28, 2023
tomaka added a commit to tomaka/smoldot that referenced this pull request Jan 12, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jan 16, 2024
…"" (#1577)

* Revert "Revert "Update to wasmi v0.32 and add lazy validation (#1488)" (#1527)"

This reverts commit 961c920.

* Update wasmi

* Fix compilation

* Update CHANGELOG

* Update wasmi

* Cargo.lock changes didn't commit

* Another update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants