Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Data and Element segments #902

Closed
wants to merge 2 commits into from
Closed

Conversation

jfbastien
Copy link
Member

Addresses #897. Related to WebAssembly/spec#399.

Copy link
Contributor

@kmiller68 kmiller68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


Note: validation rules prevent a Module from having a Data section without having a Memory section or import, as well as prevent a Module from having an Element section without having a Table.

* The `offset` [initializer expression](Modules.md#initializer-expression) of every [Data](Modules.md#data-section) and [Element](Modules.md#elements-section) segment is evaluated, any of the segments do not fit in their respective Memory or Table, throw a [`RangeError`](https://tc39.github.io/ecma262/#sec-native-error-types-used-in-this-standard-rangeerror).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"segment is evaluated, if any of the segments"

@binji
Copy link
Member

binji commented Dec 12, 2016

I agree that the common case is that linking succeeds, but I can imagine a scenario where your application has a plugin architecture which allows importing arbitrary modules. It would be nice to know that you can do this without worrying about link failure putting your application in an inconsistent state.

@pipcet
Copy link

pipcet commented Dec 12, 2016

@binji I agree, but isn't what's in the PR good enough for that? I'd imagine dlopen() would malloc() the right amount of memory, try linking, and free() it on failure, without worrying about the contents of the free()d block; similarly for table entries, though those should probably be cleared to null explicitly to make debugging easier.

Are you saying it would be better to snapshot memory before attempting any link and reverting to the pre-link state upon failure?

@jfbastien
Copy link
Member Author

Agreed with @pipcet : if you dynamically load random stuff and it fails halfway then you have to know that the memory you gave it to initialize is now invalid, regardless of whether it wrote to that memory or not. Otherwise you leak. You're already trusting that rando-load to stick to the memory you gave it, so "corrupting" is strong wording!

/meme the writes are coming from inside the module

@binji
Copy link
Member

binji commented Dec 12, 2016

Sorry, saying "dynamic-linking" is confusing, I guess. I agree that as soon as user code is running, it doesn't make much sense for any kind of rollback of tables or memories. But if we are still instantiating a module, I think it's reasonable to make sure that the memory and tables are not modified if the new module cannot be instantiated.

@jfbastien
Copy link
Member Author

Why? I can't see a useful thing that could be done with the memory / table in these cases since they were handed out to a sub-module which failed. They need to be leaked, or reclaimed. In both cases their content is irrelevant.

Maybe I'm thinking about this wrong though!

@pipcet
Copy link

pipcet commented Dec 12, 2016

Well, I'm not sure this is a useful case, but a simple application might save its state by dumping the (entire, in the worst case) memory contents into a wasm module, unexec-style. If we want to support that, and if we want to further support recovery from failed reloads, we probably would have to guarantee the all-or-nothing semantics @binji proposes.

@jfbastien
Copy link
Member Author

That seems highly wasteful, and I don't think we should encourage it. We're talking about a minor performance thing (do two passes or one) and I don't see why we'd do the slower thing. In the grand scheme of things the cost is minor, but why pay it if there's no worthwhile usecase for the two-pass approach?

@binji
Copy link
Member

binji commented Dec 12, 2016

Yeah, I'm not concerned about the memory being trashed if you instantiate the new module, which runs its start function, which traps in some way so the memory state is not saved. At that point, I'd guess you'd have to just say that the application failed to load.

I was just thinking that if we could try to preserve some shared state, and it wasn't too expensive (two passes over the segments probably wouldn't be), it may be useful. The use case of an optional plugin, popped into my mind, and it seemed reasonable. But it sounds like you think that this isn't that useful because there are a number of other ways in which a plugin module could fail, and we can't prevent indeterminate state in those cases?

I'm not sure what you mean by "leaked or reclaimed". If you hand them out to a sub-module, but they haven't been modified yet, then why can't you just pretend as if that failed instantiation never happened?

@jfbastien
Copy link
Member Author

I was just thinking that if we could try to preserve some shared state, and it wasn't too expensive (two passes over the segments probably wouldn't be), it may be useful. The use case of an optional plugin, popped into my mind, and it seemed reasonable. But it sounds like you think that this isn't that useful because there are a number of other ways in which a plugin module could fail, and we can't prevent indeterminate state in those cases?

Correct.

I'm not sure what you mean by "leaked or reclaimed". If you hand them out to a sub-module, but they haven't been modified yet, then why can't you just pretend as if that failed instantiation never happened?

I meant: you hand out memory, whoever you give it to fails and maybe trashes that memory. You either shrug and leave that memory trashed forever (leaked), or you zero it and give it back to your allocator (reclaimed).

@binji
Copy link
Member

binji commented Dec 13, 2016

OK, makes sense. Now that I think of it, if a user really wants to catch this error ahead of time, they can parse the module they're about to load to determine whether all the segments would fit.

@jfbastien
Copy link
Member Author

OK, makes sense. Now that I think of it, if a user really wants to catch this error ahead of time, they can parse the module they're about to load to determine whether all the segments would fit.

Right, if you can't trust the module you're loading then you have to do that. The module could clobber memory it's not supposed to if the offset isn't based on the global you provide it, or if the segment is too big.

I'm not saying this will never happen. Rather: doing the two-pass approach only half-solves the problem of semi-trusted modules. We're not doing anyone a service, and we're ever so slightly slowing down wasm instantiation.

Why bother?

@jfbastien
Copy link
Member Author

(I'm waiting for the @horse_wasm quote: "wasm: why bother?").

@rossberg
Copy link
Member

I'm torn. What's specified in this PR is slightly simpler (and both V8 and the spec interpreter actually implement it that way right now -- which WebAssembly/spec#399 intended to fix). Yet, what the existing text requires seems cleaner to me: it consistently checks all types and bounds of imports and internal defs before doing anything observable.

All-or-nothing is a fallacy. Declarative(-ish) definitions leaving data structures in an inconsistent state is bad IMO, even when you cannot protect against other possible failures. Asking apps to defend themselves by complex reflection is both asking a lot and encouraging bad practice. And it cannot address start function failures either, so is no more a complete solution.

@jfbastien
Copy link
Member Author

I'm not asking apps to defend themselves by complex reflection. If they have to defend themselves then they've already lost. Don't load untrusted / untested code. In my experience the failure path isn't tested, so any "defense" is likely buggy.

@titzer
Copy link

titzer commented Dec 14, 2016

It makes more sense to me to check everything up front, before performing any operation with observable effects, even though that is strictly more work in the engine. That way, applications don't need to handle inconsistent states at all.

@lukewagner
Copy link
Member

I also think it's better to do all checking up front before performing any side effects.

@rossberg
Copy link
Member

I also prefer the hygiene of up-front checking.

@pipcet
Copy link

pipcet commented Dec 17, 2016

Is it actually true that there's a non-negligible performance penalty for up-front checking? I don't think it is, since you have to validate the module anyway, and I don't think there's a reason other than performance not to do the checks.

@lukewagner
Copy link
Member

@pipcet Yes, there's ordinarily very few segments (often just 1) so the difference between checking segments in a loop up-front vs. in the loop that copies segments to memory is negligible.

@sunfishcode sunfishcode added this to the MVP milestone Jan 31, 2017
@jfbastien
Copy link
Member Author

Looks like this won't happen. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants