wasmpaser: Validate nested modules #30

alexcrichton · 2020-06-24T19:07:58Z

This commit implements recursive validation of nested modules in
wasmparser. Previously nested modules were simply skipped but now
they're recursed into and actually checked. This commit also comes with
a more complete implementation of handling alias directives.

Internally this required quite a bit of refactoring. The validator now
retains a stack of modules which are being validated and remembers
parent-relationships between them. Indices into this stack are then used
for recording type definitions. The type of a
function/instance/module/etc can be defined the current module or any
previous module on the stack. New helper functiosn were added to help
resolve this new Def type to ensure it's handled correctly.

alexcrichton · 2020-06-24T19:12:55Z

FWIW I've been wrestling a lot with the current design of the validator. I find it really difficult to work with the way it's set up, namely how there's one massive state enum and we just transition over that as we would a state machine. This is quite unwieldy and difficult to work with what would otherwise be pretty simple to work with I think.

One major drawback of this implementation is that there's really no avenue for parallel module validation. That's a major design consideration with the section orderings, however, and is something that I think we should enable.

I've also found, however, that there's a number of other drawbacks with the current validator and/or parser such as they can't accept incremental input (e.g. downloaded over the network). Additionally there's a lot of state being learned as part of nested modules which is going to have to be figured out again by consumers after validation. Ideally this would also be structured so consumers could persist information such as "what's the type of this function?" in a way that they don't have to figure out all the parent module relationships again.

What I'm thinking, however, is basically a rewrite of the validator. While this isn't necessarily a huge undertaking it's large enough that I don't want to entangle it here or anything like that. I'm hoping that as this come up in wasmtime we can talk about if and how validation/parsing should be rewritten.

yurydelendik

Looks good. I would like to run it via benchmark test though.

alexcrichton · 2020-06-29T14:36:45Z

Ok this ran on CI, but I'm not sure how to read the output myself..

alexcrichton · 2020-06-29T14:36:57Z

Does that look as expected to you @yurydelendik ?

yurydelendik · 2020-06-29T16:00:11Z

Does that look as expected to you

The results are good, but it looks like it did not checkout "main" (see error: pathspec 'main' did not match any file(s) known to git). So it is false positive. I'll run locally.

yurydelendik · 2020-06-29T16:10:33Z

I see, so not much effect from this PR

group                           after                                  before
-----                           -----                                  ------
it works benchmark              1.00      0.4±0.03ns        ? B/sec    0.99      0.4±0.01ns        ? B/sec
validate benchmark              1.00      1.4±0.03ns        ? B/sec    1.00      1.4±0.07ns        ? B/sec
validator no fails benchmark    1.00      0.6±0.02ns        ? B/sec    0.76      0.4±0.02ns        ? B/sec

P.S. we need to fix ./compare-with-main.sh so it is getting valid "main"

alexcrichton · 2020-06-29T16:16:01Z

What do these numbers mean? Is that something executing in 0.4ns?

alexcrichton · 2020-06-29T16:36:41Z

Yeah 1ns runtimes typically means tests aren't running!

This commit implements recursive validation of nested modules in wasmparser. Previously nested modules were simply skipped but now they're recursed into and actually checked. This commit also comes with a more complete implementation of handling `alias` directives. Internally this required quite a bit of refactoring. The validator now retains a stack of modules which are being validated and remembers parent-relationships between them. Indices into this stack are then used for recording type definitions. The type of a function/instance/module/etc can be defined the current module or any previous module on the stack. New helper functiosn were added to help resolve this new `Def` type to ensure it's handled correctly.

This was previously trying to validate all functions in the context of the top-level module, when instead it needed to validate each function within the right module.

alexcrichton · 2020-06-29T18:00:32Z

Ok now it is saying

group                           after                                  before
-----                           -----                                  ------
it works benchmark              1.03   215.4±13.99µs        ? B/sec    1.00    208.8±6.97µs        ? B/sec
validate benchmark              1.19   570.5±10.68µs        ? B/sec    1.00   478.6±32.82µs        ? B/sec
validator no fails benchmark    1.28   610.2±21.13µs        ? B/sec    1.00   475.4±30.67µs        ? B/sec

I fixed another issue after this measurement was taken, though, which ensures that it actually checks out all submodules so we can include wabt/spec tests in the measurement.

I don't doubt that this iteration is slower (bounds checking) but the benchmarking strategy is also somewhat flawed because the main branch has a different set of tests. I don't expect the differences in tests here to account for a 30% slowdown, however.

FWIW I'm not sure that we can do all that much better without unsafe compared to the previous iteration. I personally feel that if we really want to eek out the max performance we should refactor validation/parsing along the lines of what I was thinking above.

yurydelendik · 2020-06-29T18:05:47Z

we should refactor validation/parsing along the lines of what I was thinking above.

I was planning to fade Parser and ValidatingParser away. Hopefully you are not thinking about refactoring this API. Shall we just deprecate them, and use readers + something instead?

alexcrichton · 2020-06-29T18:12:53Z

I don't have any changes in the works, but I've been thinking recently about how things might be refactored. I'm planning on waiting until bytecodealliance/rfcs#1 is settled though and going through that process for making a proposal (and going through a more formal proposal before doing code changes).

To clarify, though, @yurydelendik do you feel it's ok to merge this as-is or would you prefer I look into the perf difference further?

yurydelendik · 2020-06-29T18:29:49Z

To clarify, though, @yurydelendik do you feel it's ok to merge this as-is or would you prefer I look into the perf difference further?

It is good to go. I'm mostly monitoring huge "it works benchmark" regression.

This commit exposes the `Module::resolve` method which allows accessing the results of name resolution. The thinking here is that formats like wasm interface types will want to access the function indices of the core module but by name as well, so we can expose the results of name resolution so the interface types section can access the same results.

Merge with release wasm-tools-1.0.51

alexcrichton force-pushed the validate branch from c3039e9 to 3b38359 Compare June 24, 2020 19:14

yurydelendik approved these changes Jun 26, 2020

View reviewed changes

alexcrichton force-pushed the validate branch from 3b38359 to 0003718 Compare June 29, 2020 14:16

alexcrichton mentioned this pull request Jun 29, 2020

Fix benchmark running on CI #35

Merged

alexcrichton force-pushed the validate branch from a29dc73 to f75a496 Compare June 29, 2020 16:48

Fix top-level validate function

f38240c

This was previously trying to validate all functions in the context of the top-level module, when instead it needed to validate each function within the right module.

alexcrichton merged commit 3d87df0 into bytecodealliance:main Jun 29, 2020

alexcrichton deleted the validate branch June 29, 2020 18:34

alexcrichton mentioned this pull request Aug 4, 2020

Implement the module linking proposal in Wasmtime bytecodealliance/wasmtime#2094

Closed

11 tasks

dhil pushed a commit to dhil/wasm-tools that referenced this pull request Oct 11, 2022

Merge pull request bytecodealliance#30 from dhil/remove-valtype-bot

cfcdbf8

dhil added a commit to dhil/wasm-tools that referenced this pull request Nov 10, 2023

Merge pull request bytecodealliance#30 from dhil/wasmfx-merge

114bfb9

Merge with release wasm-tools-1.0.51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasmpaser: Validate nested modules #30

wasmpaser: Validate nested modules #30

alexcrichton commented Jun 24, 2020

alexcrichton commented Jun 24, 2020

yurydelendik left a comment

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

wasmpaser: Validate nested modules #30

wasmpaser: Validate nested modules #30

Conversation

alexcrichton commented Jun 24, 2020

alexcrichton commented Jun 24, 2020

yurydelendik left a comment

Choose a reason for hiding this comment

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020

alexcrichton commented Jun 29, 2020

yurydelendik commented Jun 29, 2020