Clarify Data and Element segments #902

jfbastien · 2016-12-12T18:08:26Z

Addresses #897. Related to WebAssembly/spec#399.

kmiller68

lgtm

kmiller68 · 2016-12-12T19:10:14Z

JS.md

+
+Note: validation rules prevent a Module from having a Data section without having a Memory section or import, as well as prevent a Module from having an Element section without having a Table.
+
+* The `offset` [initializer expression](Modules.md#initializer-expression) of every [Data](Modules.md#data-section) and [Element](Modules.md#elements-section) segment is evaluated, any of the segments do not fit in their respective Memory or Table, throw a [`RangeError`](https://tc39.github.io/ecma262/#sec-native-error-types-used-in-this-standard-rangeerror).


"segment is evaluated, if any of the segments"

binji · 2016-12-12T19:19:34Z

I agree that the common case is that linking succeeds, but I can imagine a scenario where your application has a plugin architecture which allows importing arbitrary modules. It would be nice to know that you can do this without worrying about link failure putting your application in an inconsistent state.

pipcet · 2016-12-12T19:40:20Z

@binji I agree, but isn't what's in the PR good enough for that? I'd imagine dlopen() would malloc() the right amount of memory, try linking, and free() it on failure, without worrying about the contents of the free()d block; similarly for table entries, though those should probably be cleared to null explicitly to make debugging easier.

Are you saying it would be better to snapshot memory before attempting any link and reverting to the pre-link state upon failure?

jfbastien · 2016-12-12T20:06:51Z

Agreed with @pipcet : if you dynamically load random stuff and it fails halfway then you have to know that the memory you gave it to initialize is now invalid, regardless of whether it wrote to that memory or not. Otherwise you leak. You're already trusting that rando-load to stick to the memory you gave it, so "corrupting" is strong wording!

/meme the writes are coming from inside the module

binji · 2016-12-12T20:38:27Z

Sorry, saying "dynamic-linking" is confusing, I guess. I agree that as soon as user code is running, it doesn't make much sense for any kind of rollback of tables or memories. But if we are still instantiating a module, I think it's reasonable to make sure that the memory and tables are not modified if the new module cannot be instantiated.

jfbastien · 2016-12-12T21:06:58Z

Why? I can't see a useful thing that could be done with the memory / table in these cases since they were handed out to a sub-module which failed. They need to be leaked, or reclaimed. In both cases their content is irrelevant.

Maybe I'm thinking about this wrong though!

pipcet · 2016-12-12T21:36:17Z

Well, I'm not sure this is a useful case, but a simple application might save its state by dumping the (entire, in the worst case) memory contents into a wasm module, unexec-style. If we want to support that, and if we want to further support recovery from failed reloads, we probably would have to guarantee the all-or-nothing semantics @binji proposes.

jfbastien · 2016-12-12T21:46:44Z

That seems highly wasteful, and I don't think we should encourage it. We're talking about a minor performance thing (do two passes or one) and I don't see why we'd do the slower thing. In the grand scheme of things the cost is minor, but why pay it if there's no worthwhile usecase for the two-pass approach?

binji · 2016-12-12T22:51:48Z

Yeah, I'm not concerned about the memory being trashed if you instantiate the new module, which runs its start function, which traps in some way so the memory state is not saved. At that point, I'd guess you'd have to just say that the application failed to load.

I was just thinking that if we could try to preserve some shared state, and it wasn't too expensive (two passes over the segments probably wouldn't be), it may be useful. The use case of an optional plugin, popped into my mind, and it seemed reasonable. But it sounds like you think that this isn't that useful because there are a number of other ways in which a plugin module could fail, and we can't prevent indeterminate state in those cases?

I'm not sure what you mean by "leaked or reclaimed". If you hand them out to a sub-module, but they haven't been modified yet, then why can't you just pretend as if that failed instantiation never happened?

jfbastien · 2016-12-13T00:35:00Z

I was just thinking that if we could try to preserve some shared state, and it wasn't too expensive (two passes over the segments probably wouldn't be), it may be useful. The use case of an optional plugin, popped into my mind, and it seemed reasonable. But it sounds like you think that this isn't that useful because there are a number of other ways in which a plugin module could fail, and we can't prevent indeterminate state in those cases?

Correct.

I'm not sure what you mean by "leaked or reclaimed". If you hand them out to a sub-module, but they haven't been modified yet, then why can't you just pretend as if that failed instantiation never happened?

I meant: you hand out memory, whoever you give it to fails and maybe trashes that memory. You either shrug and leave that memory trashed forever (leaked), or you zero it and give it back to your allocator (reclaimed).

binji · 2016-12-13T01:45:25Z

OK, makes sense. Now that I think of it, if a user really wants to catch this error ahead of time, they can parse the module they're about to load to determine whether all the segments would fit.

jfbastien · 2016-12-13T01:48:26Z

OK, makes sense. Now that I think of it, if a user really wants to catch this error ahead of time, they can parse the module they're about to load to determine whether all the segments would fit.

Right, if you can't trust the module you're loading then you have to do that. The module could clobber memory it's not supposed to if the offset isn't based on the global you provide it, or if the segment is too big.

I'm not saying this will never happen. Rather: doing the two-pass approach only half-solves the problem of semi-trusted modules. We're not doing anyone a service, and we're ever so slightly slowing down wasm instantiation.

Why bother?

jfbastien · 2016-12-13T01:48:58Z

(I'm waiting for the @horse_wasm quote: "wasm: why bother?").

rossberg · 2016-12-13T12:41:33Z

I'm torn. What's specified in this PR is slightly simpler (and both V8 and the spec interpreter actually implement it that way right now -- which WebAssembly/spec#399 intended to fix). Yet, what the existing text requires seems cleaner to me: it consistently checks all types and bounds of imports and internal defs before doing anything observable.

All-or-nothing is a fallacy. Declarative(-ish) definitions leaving data structures in an inconsistent state is bad IMO, even when you cannot protect against other possible failures. Asking apps to defend themselves by complex reflection is both asking a lot and encouraging bad practice. And it cannot address start function failures either, so is no more a complete solution.

jfbastien · 2016-12-13T17:03:08Z

I'm not asking apps to defend themselves by complex reflection. If they have to defend themselves then they've already lost. Don't load untrusted / untested code. In my experience the failure path isn't tested, so any "defense" is likely buggy.

titzer · 2016-12-14T10:06:01Z

It makes more sense to me to check everything up front, before performing any operation with observable effects, even though that is strictly more work in the engine. That way, applications don't need to handle inconsistent states at all.

lukewagner · 2016-12-17T00:29:11Z

I also think it's better to do all checking up front before performing any side effects.

rossberg · 2016-12-17T14:08:48Z

I also prefer the hygiene of up-front checking.

pipcet · 2016-12-17T17:32:57Z

Is it actually true that there's a non-negligible performance penalty for up-front checking? I don't think it is, since you have to validate the module anyway, and I don't think there's a reason other than performance not to do the checks.

lukewagner · 2016-12-20T22:54:55Z

@pipcet Yes, there's ordinarily very few segments (often just 1) so the difference between checking segments in a loop up-front vs. in the loop that copies segments to memory is negligible.

jfbastien · 2017-06-12T20:28:03Z

Looks like this won't happen. Closing.

Clarify Data and Element segments

22901d9

Addresses #897. Related to WebAssembly/spec#399.

jfbastien mentioned this pull request Dec 12, 2016

Check all segment bounds beforehand WebAssembly/spec#399

Merged

kmiller68 reviewed Dec 12, 2016

View reviewed changes

Typo

95a3bee

sunfishcode added this to the MVP milestone Jan 31, 2017

jfbastien closed this Jun 12, 2017

conrad-watt mentioned this pull request Jun 5, 2018

Bounds/permission checks during instantiation WebAssembly/threads#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify Data and Element segments #902

Clarify Data and Element segments #902

jfbastien commented Dec 12, 2016

kmiller68 left a comment

kmiller68 Dec 12, 2016

binji commented Dec 12, 2016

pipcet commented Dec 12, 2016

jfbastien commented Dec 12, 2016

binji commented Dec 12, 2016

jfbastien commented Dec 12, 2016

pipcet commented Dec 12, 2016

jfbastien commented Dec 12, 2016

binji commented Dec 12, 2016

jfbastien commented Dec 13, 2016

binji commented Dec 13, 2016

jfbastien commented Dec 13, 2016

jfbastien commented Dec 13, 2016

rossberg commented Dec 13, 2016

jfbastien commented Dec 13, 2016

titzer commented Dec 14, 2016

lukewagner commented Dec 17, 2016

rossberg commented Dec 17, 2016

pipcet commented Dec 17, 2016

lukewagner commented Dec 20, 2016

jfbastien commented Jun 12, 2017


		Note: validation rules prevent a Module from having a Data section without having a Memory section or import, as well as prevent a Module from having an Element section without having a Table.

		* The `offset` [initializer expression](Modules.md#initializer-expression) of every [Data](Modules.md#data-section) and [Element](Modules.md#elements-section) segment is evaluated, any of the segments do not fit in their respective Memory or Table, throw a [`RangeError`](https://tc39.github.io/ecma262/#sec-native-error-types-used-in-this-standard-rangeerror).

Clarify Data and Element segments #902

Clarify Data and Element segments #902

Conversation

jfbastien commented Dec 12, 2016

kmiller68 left a comment

Choose a reason for hiding this comment

kmiller68 Dec 12, 2016

Choose a reason for hiding this comment

binji commented Dec 12, 2016

pipcet commented Dec 12, 2016

jfbastien commented Dec 12, 2016

binji commented Dec 12, 2016

jfbastien commented Dec 12, 2016

pipcet commented Dec 12, 2016

jfbastien commented Dec 12, 2016

binji commented Dec 12, 2016

jfbastien commented Dec 13, 2016

binji commented Dec 13, 2016

jfbastien commented Dec 13, 2016

jfbastien commented Dec 13, 2016

rossberg commented Dec 13, 2016

jfbastien commented Dec 13, 2016

titzer commented Dec 14, 2016

lukewagner commented Dec 17, 2016

rossberg commented Dec 17, 2016

pipcet commented Dec 17, 2016

lukewagner commented Dec 20, 2016

jfbastien commented Jun 12, 2017