Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasmpaser: Validate nested modules #30

Merged
merged 2 commits into from
Jun 29, 2020

Conversation

alexcrichton
Copy link
Member

This commit implements recursive validation of nested modules in
wasmparser. Previously nested modules were simply skipped but now
they're recursed into and actually checked. This commit also comes with
a more complete implementation of handling alias directives.

Internally this required quite a bit of refactoring. The validator now
retains a stack of modules which are being validated and remembers
parent-relationships between them. Indices into this stack are then used
for recording type definitions. The type of a
function/instance/module/etc can be defined the current module or any
previous module on the stack. New helper functiosn were added to help
resolve this new Def type to ensure it's handled correctly.

@alexcrichton
Copy link
Member Author

FWIW I've been wrestling a lot with the current design of the validator. I find it really difficult to work with the way it's set up, namely how there's one massive state enum and we just transition over that as we would a state machine. This is quite unwieldy and difficult to work with what would otherwise be pretty simple to work with I think.

One major drawback of this implementation is that there's really no avenue for parallel module validation. That's a major design consideration with the section orderings, however, and is something that I think we should enable.

I've also found, however, that there's a number of other drawbacks with the current validator and/or parser such as they can't accept incremental input (e.g. downloaded over the network). Additionally there's a lot of state being learned as part of nested modules which is going to have to be figured out again by consumers after validation. Ideally this would also be structured so consumers could persist information such as "what's the type of this function?" in a way that they don't have to figure out all the parent module relationships again.

What I'm thinking, however, is basically a rewrite of the validator. While this isn't necessarily a huge undertaking it's large enough that I don't want to entangle it here or anything like that. I'm hoping that as this come up in wasmtime we can talk about if and how validation/parsing should be rewritten.

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would like to run it via benchmark test though.

@alexcrichton
Copy link
Member Author

Ok this ran on CI, but I'm not sure how to read the output myself..

@alexcrichton
Copy link
Member Author

Does that look as expected to you @yurydelendik ?

@yurydelendik
Copy link
Contributor

Does that look as expected to you

The results are good, but it looks like it did not checkout "main" (see error: pathspec 'main' did not match any file(s) known to git). So it is false positive. I'll run locally.

@yurydelendik
Copy link
Contributor

I see, so not much effect from this PR

group                           after                                  before
-----                           -----                                  ------
it works benchmark              1.00      0.4±0.03ns        ? B/sec    0.99      0.4±0.01ns        ? B/sec
validate benchmark              1.00      1.4±0.03ns        ? B/sec    1.00      1.4±0.07ns        ? B/sec
validator no fails benchmark    1.00      0.6±0.02ns        ? B/sec    0.76      0.4±0.02ns        ? B/sec

P.S. we need to fix ./compare-with-main.sh so it is getting valid "main"

@alexcrichton
Copy link
Member Author

What do these numbers mean? Is that something executing in 0.4ns?

@alexcrichton
Copy link
Member Author

Yeah 1ns runtimes typically means tests aren't running!

This commit implements recursive validation of nested modules in
wasmparser. Previously nested modules were simply skipped but now
they're recursed into and actually checked. This commit also comes with
a more complete implementation of handling `alias` directives.

Internally this required quite a bit of refactoring. The validator now
retains a stack of modules which are being validated and remembers
parent-relationships between them. Indices into this stack are then used
for recording type definitions. The type of a
function/instance/module/etc can be defined the current module or any
previous module on the stack. New helper functiosn were added to help
resolve this new `Def` type to ensure it's handled correctly.
This was previously trying to validate all functions in the context of
the top-level module, when instead it needed to validate each function
within the right module.
@alexcrichton
Copy link
Member Author

Ok now it is saying

group                           after                                  before
-----                           -----                                  ------
it works benchmark              1.03   215.4±13.99µs        ? B/sec    1.00    208.8±6.97µs        ? B/sec
validate benchmark              1.19   570.5±10.68µs        ? B/sec    1.00   478.6±32.82µs        ? B/sec
validator no fails benchmark    1.28   610.2±21.13µs        ? B/sec    1.00   475.4±30.67µs        ? B/sec

I fixed another issue after this measurement was taken, though, which ensures that it actually checks out all submodules so we can include wabt/spec tests in the measurement.

I don't doubt that this iteration is slower (bounds checking) but the benchmarking strategy is also somewhat flawed because the main branch has a different set of tests. I don't expect the differences in tests here to account for a 30% slowdown, however.

FWIW I'm not sure that we can do all that much better without unsafe compared to the previous iteration. I personally feel that if we really want to eek out the max performance we should refactor validation/parsing along the lines of what I was thinking above.

@yurydelendik
Copy link
Contributor

we should refactor validation/parsing along the lines of what I was thinking above.

I was planning to fade Parser and ValidatingParser away. Hopefully you are not thinking about refactoring this API. Shall we just deprecate them, and use readers + something instead?

@alexcrichton
Copy link
Member Author

I don't have any changes in the works, but I've been thinking recently about how things might be refactored. I'm planning on waiting until bytecodealliance/rfcs#1 is settled though and going through that process for making a proposal (and going through a more formal proposal before doing code changes).

To clarify, though, @yurydelendik do you feel it's ok to merge this as-is or would you prefer I look into the perf difference further?

@yurydelendik
Copy link
Contributor

To clarify, though, @yurydelendik do you feel it's ok to merge this as-is or would you prefer I look into the perf difference further?

It is good to go. I'm mostly monitoring huge "it works benchmark" regression.

@alexcrichton alexcrichton merged commit 3d87df0 into bytecodealliance:main Jun 29, 2020
@alexcrichton alexcrichton deleted the validate branch June 29, 2020 18:34
fitzgen pushed a commit to fitzgen/wasm-tools that referenced this pull request Oct 21, 2020
This commit exposes the `Module::resolve` method which allows accessing
the results of name resolution. The thinking here is that formats like
wasm interface types will want to access the function indices of the
core module but by name as well, so we can expose the results of name
resolution so the interface types section can access the same results.
dhil pushed a commit to dhil/wasm-tools that referenced this pull request Oct 11, 2022
dhil added a commit to dhil/wasm-tools that referenced this pull request Nov 10, 2023
Merge with release wasm-tools-1.0.51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants