Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data segment overlaps #897

Closed
jfbastien opened this issue Dec 9, 2016 · 25 comments
Closed

Data segment overlaps #897

jfbastien opened this issue Dec 9, 2016 · 25 comments
Milestone

Comments

@jfbastien
Copy link
Member

Data segments aren't well defined.

  • If they overlap, does it fail?
  • If so, what's the exception?
  • When is it thrown? Some of these issues can be known at Module time (if it was an i32.const), and others at Instance time (if it was a global).
  • If they're out of bounds, does it fail?
@binji
Copy link
Member

binji commented Dec 9, 2016

This is specified in JS.md, but probably should be mentioned elsewhere too.

Segments may overlap and, if they do, the final value is the last value written in order.

If, after evaluating the offset initializer expression of every Data and Element Segment, any of the segments do not fit in their respective Memory or Table, throw a RangeError.

@jfbastien
Copy link
Member Author

What if the initializer is a constant, and it's known at Module time that it's out of bounds? The failure still occurs at Instance time?

@jfbastien
Copy link
Member Author

Oh also, what if there's a Data section but no memory (neither imported, nor in a section)?

@jfbastien
Copy link
Member Author

I'm also not sure we specify that data segments are initialized before start is called.

@jfbastien
Copy link
Member Author

Oh, and if linking fails, is an imported memory changed by data segments? Or do data segments get populated after link succeeds, but before start is called?

@jfbastien
Copy link
Member Author

Oh and if initializing segments fails halfway, I assume it's visible up to when it failed? i.e. if the first segment is valid but the second isn't, then the first is set, second isn't?

And what if a segment is only partly valid? i.e. it starts in range but ends out of range. Do we see a partial write, or none of that segment's write? I'd rather not do partial, but that's what wasm does when executing.

hubot pushed a commit to WebKit/WebKit-http that referenced this issue Dec 10, 2016
https://bugs.webkit.org/show_bug.cgi?id=165696

Reviewed by Keith Miller.

As specified in https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#data-section
Note that some of the interesting corner cases are ill-defined by the spec: WebAssembly/design#897

JSTests:

* wasm/Builder.js: create a data section from JavaScript
* wasm/Builder_WebAssemblyBinary.js: assemble the data section into the proper binary encoding
(const.emitters.Data):
* wasm/js-api/test_Data.js: Added.
(DataSection):
(DataSectionOffTheEnd):
(DataSectionPartlyOffTheEnd):
(DataSectionEmptyOffTheEnd):
(DataSectionSeenByStart):
* wasm/self-test/test_BuilderJSON.js: make sure the JSON structure is fine (this sanity checks before going to binary)

Source/JavaScriptCore:

* wasm/WasmFormat.h: segments are what represent sections of memory to initialize (similar to ELF's non-zero intializer data / rodata)
(JSC::Wasm::Segment::make):
(JSC::Wasm::Segment::destroy):
(JSC::Wasm::Segment::byte):
(JSC::Wasm::Segment::makePtr):
* wasm/WasmModuleParser.cpp: parse the data section, and prevent a few overflows if a user passes in UINT_MAX (the loops would overflow)
(JSC::Wasm::ModuleParser::parseType):
(JSC::Wasm::ModuleParser::parseImport):
(JSC::Wasm::ModuleParser::parseFunction):
(JSC::Wasm::ModuleParser::parseExport):
(JSC::Wasm::ModuleParser::parseCode):
(JSC::Wasm::ModuleParser::parseData):
* wasm/js/WebAssemblyModuleRecord.cpp:
(JSC::WebAssemblyModuleRecord::evaluate): the only sensible time to initialize the data section is after linking, but before calling start, I test for this but the spec isn't clear it's correct yet


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@209651 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@jfbastien
Copy link
Member Author

Another one: what if there's an empty data section, and no memory (neither imported, nor memory section)? Is that an error at Module creation time?

@jfbastien
Copy link
Member Author

This is fun. What if the memory exists but is zero-sized, and we initialize with an empty data segment?

@rossberg
Copy link
Member

@jfbastien, have you checked the spec interpreter and test suite?

  • If they overlap, does it fail?
  • If so, what's the exception?
  • When is it thrown? Some of these issues can be known at Module time (if it was an i32.const), and others at Instance time (if it was a global).

What @binji says, no failure (a new module could be linked in at any later time, so that would require engines to track what parts of tables have been initialised).

We are missing more tests for the element stuff. Contributions welcome.

  • If they're out of bounds, does it fail?

Yes, that's a linking error. JS.md currently says it causes a RangeError, but see #889 for possible revision.

What if the initializer is a constant, and it's known at Module time that it's out of bounds? The failure still occurs at Instance time?

Yes, it's not worth making a special case and analysis for that.

Oh also, what if there's a Data section but no memory (neither imported, nor in a section)?

Validation error, see the the tests here:

https://github.com/WebAssembly/spec/blob/master/interpreter/test/memory.wast#L26

I'm also not sure we specify that data segments are initialized before start is called.
Oh, and if linking fails, is an imported memory changed by data segments? Or do data segments get populated after link succeeds, but before start is called?

Both JS.md and the interpreter are explicit that link-time type-checking is performed before initialisation. There is a simple test here that checks that the memory remains unchanged:

https://github.com/WebAssembly/spec/blob/master/interpreter/test/linking.wast#L232

Obviously, we could use more tests. Contributions welcome.

Oh and if initializing segments fails halfway, I assume it's visible up to when it failed? i.e. if the first segment is valid but the second isn't, then the first is set, second isn't?
And what if a segment is only partly valid? i.e. it starts in range but ends out of range. Do we see a partial write, or none of that segment's write? I'd rather not do partial, but that's what wasm does when executing.

JS.md specifies that bounds checks for all data and elem segments occur before any of them take effect:

If, after evaluating the offset initializer expression of every Data and Element Segment, any of the > segments do not fit in their respective Memory or Table, throw a RangeError.

Apply all Data and Element segments to their respective Memory or Table in the order in which they appear in the module.

The interpreter actually disagrees and evaluates them linearly, without prior checks. I'll change that to match JS.md.

Another one: what if there's an empty data section, and no memory (neither imported, nor memory section)? Is that an error at Module creation time?

Still a validation error, see the tests linked to above.

This is fun. What if the memory exists but is zero-sized, and we initialize with an empty data segment?

That's okay and implied by the wording, see also the test here:

https://github.com/WebAssembly/spec/blob/master/interpreter/test/memory.wast#L6

@rossberg
Copy link
Member

Btw, the more interesting corner case is the one tested at

https://github.com/WebAssembly/spec/blob/master/interpreter/test/memory.wast#L67

It checks that a zero-size segment is okay even when it is completely out of bounds. The rationale there was that the check simply is supposed to rule out invalid memory writes, and this doesn't perform any. But arguably it would be cleaner to have a stricter check requiring that the end address is within bounds (which implies that the start address is). That's slightly simpler to test, too, so I'd suggest changing that.

@rossberg
Copy link
Member

See WebAssembly/spec#399 for the bounds checks before mutation issue. Includes the corner case change pointed out in the previous comment.

@jfbastien
Copy link
Member Author

It checks that a zero-size segment is okay even when it is completely out of bounds. The rationale there was that the check simply is supposed to rule out invalid memory writes, and this doesn't perform any. But arguably it would be cleaner to have a stricter check requiring that the end address is within bounds (which implies that the start address is). That's slightly simpler to test, too, so I'd suggest changing that.

I would make it a validation error to have zero-sized segments.

@rossberg
Copy link
Member

That would seem like an unnecessary special case to me.

@jfbastien
Copy link
Member Author

That would seem like an unnecessary special case to me.

What's then "end" address of a zero-sized segment then?

@rossberg
Copy link
Member

@jfbastien, what's wrong with end = start + length there?

@saambarati
Copy link

Why are zero sized segments simply not validation errors? Seems like validation errors should catch bugs. A zero sized segment sounds like a bug to me.

@saambarati
Copy link

Also, we should come up with an answer to an elem_type with a num_elem of zero.

IMO, this should also be a validation error.
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#element-section

@rossberg
Copy link
Member

Also, we should come up with an answer to an elem_type with a num_elem of zero.

@saambarati, can you clarify what you mean by that? How is the elem_type dependent on the table size?

@saambarati
Copy link

@rossberg-chromium I used the wrong name. I meant elem_segment not elem_type

@lukewagner
Copy link
Member

Agreed with @rossberg-chromium's answers above which also match the SM impl with the already-noted exception of the weird zero-length segment rules. I'd rather not take special action to disallow zero-length anything since I can see this annoying more than helping. Rather, I think the "obvious" range check if (seg.offset > length || length - seg.offset < seg.length) fail() should reject out-of-bounds zero-length segments.

@rossberg
Copy link
Member

@lukewagner, I don't think the first half of the condition is even needed, it should be implied by the second (which I would write as seg.offset + seg.length > length).

All assuming proper arithmetics, of course, not wrap-around.

@lukewagner
Copy link
Member

All assuming proper arithmetics, of course, not wrap-around.

Hehe, yes, I lifted my condition from C++ code :)

@rossberg
Copy link
Member

Seems like the only change we have consensus on is disallowing segments whose start address is out of bounds (which is a change in the zero-sized case -- the "weird" case mentioned above). Any disagreement with that change in particular?

@rossberg
Copy link
Member

Okay, there seems to be no disagreement on that point, so I took the liberty to land WebAssembly/spec#399, which makes the spec match the design doc while making the aforementioned small change.

@sunfishcode sunfishcode added this to the MVP milestone Jan 31, 2017
@rossberg
Copy link
Member

rossberg commented Feb 6, 2017

This seems to be resolved.

@rossberg rossberg closed this as completed Feb 6, 2017
binji added a commit to WebAssembly/bulk-memory-operations that referenced this issue Jan 23, 2019
`memory.init`, `memory.copy`, `memory.fill`, `table.init`, and `table.copy` should all fail if their ranges are out-of-bounds, even if the count is zero, similar to for active segments. See the discussions here: WebAssembly/design#897, #11.
binji added a commit to WebAssembly/bulk-memory-operations that referenced this issue Jan 24, 2019
`memory.init`, `memory.copy`, `memory.fill`, `table.init`, and `table.copy` should all fail if their ranges are out-of-bounds, even if the count is zero, similar to for active segments. See the discussions here: WebAssembly/design#897, #11.
ryanhaddad pushed a commit to WebKit/WebKit that referenced this issue Dec 22, 2020
https://bugs.webkit.org/show_bug.cgi?id=165696

Reviewed by Keith Miller.

As specified in https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#data-section
Note that some of the interesting corner cases are ill-defined by the spec: WebAssembly/design#897

JSTests:

* wasm/Builder.js: create a data section from JavaScript
* wasm/Builder_WebAssemblyBinary.js: assemble the data section into the proper binary encoding
(const.emitters.Data):
* wasm/js-api/test_Data.js: Added.
(DataSection):
(DataSectionOffTheEnd):
(DataSectionPartlyOffTheEnd):
(DataSectionEmptyOffTheEnd):
(DataSectionSeenByStart):
* wasm/self-test/test_BuilderJSON.js: make sure the JSON structure is fine (this sanity checks before going to binary)

Source/JavaScriptCore:

* wasm/WasmFormat.h: segments are what represent sections of memory to initialize (similar to ELF's non-zero intializer data / rodata)
(JSC::Wasm::Segment::make):
(JSC::Wasm::Segment::destroy):
(JSC::Wasm::Segment::byte):
(JSC::Wasm::Segment::makePtr):
* wasm/WasmModuleParser.cpp: parse the data section, and prevent a few overflows if a user passes in UINT_MAX (the loops would overflow)
(JSC::Wasm::ModuleParser::parseType):
(JSC::Wasm::ModuleParser::parseImport):
(JSC::Wasm::ModuleParser::parseFunction):
(JSC::Wasm::ModuleParser::parseExport):
(JSC::Wasm::ModuleParser::parseCode):
(JSC::Wasm::ModuleParser::parseData):
* wasm/js/WebAssemblyModuleRecord.cpp:
(JSC::WebAssemblyModuleRecord::evaluate): the only sensible time to initialize the data section is after linking, but before calling start, I test for this but the spec isn't clear it's correct yet


Canonical link: https://commits.webkit.org/183307@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@209651 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants