-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify Data and Element segments #902
Conversation
Addresses #897. Related to WebAssembly/spec#399.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
Note: validation rules prevent a Module from having a Data section without having a Memory section or import, as well as prevent a Module from having an Element section without having a Table. | ||
|
||
* The `offset` [initializer expression](Modules.md#initializer-expression) of every [Data](Modules.md#data-section) and [Element](Modules.md#elements-section) segment is evaluated, any of the segments do not fit in their respective Memory or Table, throw a [`RangeError`](https://tc39.github.io/ecma262/#sec-native-error-types-used-in-this-standard-rangeerror). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"segment is evaluated, if any of the segments"
I agree that the common case is that linking succeeds, but I can imagine a scenario where your application has a plugin architecture which allows importing arbitrary modules. It would be nice to know that you can do this without worrying about link failure putting your application in an inconsistent state. |
@binji I agree, but isn't what's in the PR good enough for that? I'd imagine dlopen() would malloc() the right amount of memory, try linking, and free() it on failure, without worrying about the contents of the free()d block; similarly for table entries, though those should probably be cleared to null explicitly to make debugging easier. Are you saying it would be better to snapshot memory before attempting any link and reverting to the pre-link state upon failure? |
Agreed with @pipcet : if you dynamically load random stuff and it fails halfway then you have to know that the memory you gave it to initialize is now invalid, regardless of whether it wrote to that memory or not. Otherwise you leak. You're already trusting that rando-load to stick to the memory you gave it, so "corrupting" is strong wording! /meme the writes are coming from inside the module |
Sorry, saying "dynamic-linking" is confusing, I guess. I agree that as soon as user code is running, it doesn't make much sense for any kind of rollback of tables or memories. But if we are still instantiating a module, I think it's reasonable to make sure that the memory and tables are not modified if the new module cannot be instantiated. |
Why? I can't see a useful thing that could be done with the memory / table in these cases since they were handed out to a sub-module which failed. They need to be leaked, or reclaimed. In both cases their content is irrelevant. Maybe I'm thinking about this wrong though! |
Well, I'm not sure this is a useful case, but a simple application might save its state by dumping the (entire, in the worst case) memory contents into a wasm module, unexec-style. If we want to support that, and if we want to further support recovery from failed reloads, we probably would have to guarantee the all-or-nothing semantics @binji proposes. |
That seems highly wasteful, and I don't think we should encourage it. We're talking about a minor performance thing (do two passes or one) and I don't see why we'd do the slower thing. In the grand scheme of things the cost is minor, but why pay it if there's no worthwhile usecase for the two-pass approach? |
Yeah, I'm not concerned about the memory being trashed if you instantiate the new module, which runs its start function, which traps in some way so the memory state is not saved. At that point, I'd guess you'd have to just say that the application failed to load. I was just thinking that if we could try to preserve some shared state, and it wasn't too expensive (two passes over the segments probably wouldn't be), it may be useful. The use case of an optional plugin, popped into my mind, and it seemed reasonable. But it sounds like you think that this isn't that useful because there are a number of other ways in which a plugin module could fail, and we can't prevent indeterminate state in those cases? I'm not sure what you mean by "leaked or reclaimed". If you hand them out to a sub-module, but they haven't been modified yet, then why can't you just pretend as if that failed instantiation never happened? |
Correct.
I meant: you hand out memory, whoever you give it to fails and maybe trashes that memory. You either shrug and leave that memory trashed forever (leaked), or you zero it and give it back to your allocator (reclaimed). |
OK, makes sense. Now that I think of it, if a user really wants to catch this error ahead of time, they can parse the module they're about to load to determine whether all the segments would fit. |
Right, if you can't trust the module you're loading then you have to do that. The module could clobber memory it's not supposed to if the offset isn't based on the global you provide it, or if the segment is too big. I'm not saying this will never happen. Rather: doing the two-pass approach only half-solves the problem of semi-trusted modules. We're not doing anyone a service, and we're ever so slightly slowing down wasm instantiation. Why bother? |
(I'm waiting for the @horse_wasm quote: "wasm: why bother?"). |
I'm torn. What's specified in this PR is slightly simpler (and both V8 and the spec interpreter actually implement it that way right now -- which WebAssembly/spec#399 intended to fix). Yet, what the existing text requires seems cleaner to me: it consistently checks all types and bounds of imports and internal defs before doing anything observable. All-or-nothing is a fallacy. Declarative(-ish) definitions leaving data structures in an inconsistent state is bad IMO, even when you cannot protect against other possible failures. Asking apps to defend themselves by complex reflection is both asking a lot and encouraging bad practice. And it cannot address start function failures either, so is no more a complete solution. |
I'm not asking apps to defend themselves by complex reflection. If they have to defend themselves then they've already lost. Don't load untrusted / untested code. In my experience the failure path isn't tested, so any "defense" is likely buggy. |
It makes more sense to me to check everything up front, before performing any operation with observable effects, even though that is strictly more work in the engine. That way, applications don't need to handle inconsistent states at all. |
I also think it's better to do all checking up front before performing any side effects. |
I also prefer the hygiene of up-front checking. |
Is it actually true that there's a non-negligible performance penalty for up-front checking? I don't think it is, since you have to validate the module anyway, and I don't think there's a reason other than performance not to do the checks. |
@pipcet Yes, there's ordinarily very few segments (often just 1) so the difference between checking segments in a loop up-front vs. in the loop that copies segments to memory is negligible. |
Looks like this won't happen. Closing. |
Addresses #897. Related to WebAssembly/spec#399.