-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should it be valid to have empty Data/Element sections #910
Comments
Pretty much all sections. Code without Code? etc. |
I was surprised that right now, it's illegal to have a zero-table table section if there is a table import, or a zero-memories memory section if there is a memory import. If there is no import, though, zero tables are okay. I think it would be cleaner to allow those, and zero-element sized sections generally. I don't think that would cost consumers anything, and producers could cut a corner. (By "sized sections", I mean sections beginning with a count of entries and then containing the entries. Right now, that's all sections but the start section; maybe the start section should have a count (1 in the MVP) as well, for consistency?) |
@pipcet My gut feeling is to go in the opposite direction. It's likely a bug to have zero sized sections of anything that are meaningful only if they have a size. So my vote would be to make zero sized sections an error. |
@saambarati I'm not sure I can think of a common bug that would result in a valid, but zero-sized, section when a valid non-zero-sized section is the intended output. Instead, it seems to me more likely that the case in which there is code but no data or data but no code isn't usually well-tested, and might well result in a zero-sized section. I think a simple producer should only be expected to fill in a fixed wasm skeleton with its data, rather than conditionalizing part of the skeleton based on whether there is any data. That's what I do for intermediate files, which are produced as binary output by GNU ld (so, in particular, they have a linker map allowing me to trace back file offsets) but should also be valid wasm. (I realize I'm a bit of a special case :-) ). |
I don't see any good reason for introducing all these special cases for zero-sized things. It generally is unnecessary hassle to disallow them, for both producers and consumers. Catching internal problems in producers is not the purpose of validation. @pipcet, where do you read that zero-sized table sections are illegal in the presence of an import? The spec interpreter makes no such restriction. If that's stated somewhere then it's simply an oversight and should be corrected (probably a left-over from the pre-multi-table design?). |
@rossberg-chromium I think it's inevitably something that needs to be thought about. Would an elem_segment of zero count but an offset that would be beyond the table's size cause a RangeError during instantiation? If not, the text that talks about RangeErrors needs to be very specific about how it does its checks. If it is an error, again, this text needs to be very specific about how it does its checks. |
You seem to be assuming it hasn't been thought about yet. :) There are tests for it in the test suite. But I agree for the specific case you mention, that's the corner case I pointed out in my earlier comment: #897 (comment) |
@rossberg-chromium Module.md says:
(and equivalently for tables). SpiderMonkey implements this faithfully and complains about a zero-sized table section in the presence of an import. Even if that restriction is to remain, the "thus" doesn't make sense. You're right that what's probably intended is "... or table definition". (Note that each sentence occurs twice for a total of four places that need changing). |
@pipcet, thanks, yes, this looks very much like an oversight. It wasn't adapted when we moved from single table sections to multi-table sections. I'll create a PR. |
Since we're doing something about zero-sized things already, I would rather do that something up-front, at validation time. |
@jfbastien, we never need to do anything specific for zero-sized things, it always falls out naturally from the general rules. That includes the specific case mentioned above. However, in that case the behaviour depends on what general validation criterion we choose for segment bounds. There are basically two choices for that one:
The former probably fits better with a priori bounds checking of segments (as currently specified in JS.md), while the latter fits better with the change you proposed. The zero-size case follows from there. |
@rossberg-chromium If we were to go with choice 1, what would be a safe address to use for a dynamic loader assigning a new module's zero-sized data segment's base address? I'm assuming NULL pointers might one day be made invalid, so it's not 0. It's not "the end of memory" either, unless you're willing to guarantee that memory won't ever be allowed to shrink. The best I can come up with is that the dynamic loader would use its own data base address, which goes wrong if the dynamic loader itself doesn't have a data segment. So, IMHO, choice 2 sounds much safer, even though it's arguably a special case for zero-sized things. |
@pipcet, hm, not sure I understand your scenario. How is the zero case different (more difficult even?) from the non-zero case? What strategy do you envision for the general case that doesn't work for zero? It's unclear to me how null pointers relate, but note that there actually is no such thing in Wasm. That's a detail of compiling certain languages to it, nothing built in, so nothing we can make illegal either. (And strictly speaking, not even C requires representing null pointers with the value 0.) |
@rossberg-chromium In the general case, the dynamic loader would presumably just use So (The NULL pointer issue is tangential, but making NULL invalid (while keeping it at LMA 0) is definitely on my post-MVP wishlist. I'm not proposing to handle LMA 0 differently from other addresses, of course, just that there be an option to unmap certain address ranges that C could use to unmap LMA 0, while other languages would presumably have no need of that.) In summary, I think there are conceivable (though possibly ill-conceived) post-MVP extensions that would break things with choice 1 but work just fine with choice 2, so choice 2 gives us more options for future decisions. |
@pipcet, I think this is mixing up several levels of abstractions.
First, segments are not allocated. They only initialise memory in a
prexisting memory. Are you somehow presuming a loader that would use this
information to allocate the needed memory? Then clearly it cannot make a
single allocation per segment, zero-sized or not, since there can only be
one memory. In general, such an approach seems to be conflating allocation
with initialisation.
Second, how is malloc related? There is no malloc in Wasm, nor in
JavaScript. Embedders generally need to represent linear memory as its own
abstraction. Are you envisioning an environment where linear memories are
provided as raw OS chunks? I don't think that can work, because you need to
at least carry along the size limits. So even in a stand-alone engine there
needs to be a layer on top of malloc.
Finally, your conclusion seems very counter-intuitive to me. Choice 2 is
strictly more permissive. The save bet usually is erring on the side of the
more conservative choice -- we can always relax 1 to 2 later, but not vice
versa.
|
@rossberg-chromium Okay, I'm confused now. A And while But let's stay with C for a second: here's the pseudo-code for my dynamic loader, written in C and compiled to wasm, with
With choice 1, that code will fail sometimes, depending on the malloc implementation. This code would work:
...maybe. If the C implementation assigns NULL pointers to a valid address. What would definitely work is
But that's special-casing size 0. Post-MVP extensions might make preexisting code invalid (for example, code that accesses linear memory address 0 would be invalidated if wasm and the language backend change to unmap address 0). The conservative option does not mean encouraging applications to use code that would then clash with potential post-MVP extensions, but that's precisely what choice 1 does: it would encourage applications to assign 0 or end-of-memory to the immutable global used as base address by a
|
@pipcet, I think you are misunderstanding the concept of data sections. There is no allocation involved. They essentially just provide convenience for defining a sequence of stores to locations in the linear memory, which the program must have defined or imported beforehand in some way. Your pseudo code cannot possibly work with general Wasm modules. The base address for a data segment generally is chosen by the module, and a loader would have no way of adjusting it, nor of communicating its choice to the module. The only situation in which a loader could control the address would be when (a) it is defined by a plain global, and (b) that global happens to be an import. So while it would presumably be possible to implement a linking framework along the lines you sketch, it would have to make various assumptions and impose various restrictions on the modules it can handle. For example:
These conventions and restrictions would define a kind of Dynamically-linked C ABI. But that would be on top of Wasm. And at that point, also saying something about zero-sized segments, if you must, is no problem (though I still doubt it's necessary even there). Moreover, the framework has full control over the malloc semantics it provides or how it applies it. To clarify the other point you brought up: there is absolutely no intention to make breaking changes to Wasm later. On the Web that is a complete no-go. The only way that could possibly happen is by introducing a new version of Wasm alongside the old one. |
@rossberg-chromium I think what you seem to be describing as some kind of special case is in fact what I am doing, and, as far as I can think, the only reasonable way of doing things. Perhaps I should have been clearer that I am actually building and using dynamically-linked wasm modules (though I fully expect, and am looking forward to, that everything will break as the MVP spec changes some more), though I've side-stepped the issue we're discussing by assigning fixed addresses. This isn't a hypothetical question of "I'm going to write a dynamic loader some day", but more one of "I'm now in a position to put in an ugly
That is in fact what I am doing, yes. I don't see any other reasonable way to implement dynamic loading.
That is very much like the ABI I'm actually using, with some slight changes (for example, the "master" module can hard-code addresses, but the library modules cannot; dynamic memory acquisitions don't have to happen through the master module, though it's better if they do). It's so much analogous to any other common ABI out there that I thought it unnecessary to specify, for which I apologize.
I do agree, FWIW, that we can stay with choice 2 for now (though I haven't seen an argument in favor of it—it's either nil-nil or nil-one against), then hastily change to choice 1 if and when we allow the unmapping of addresses/shrinking of memory (which I hope you don't consider a breaking change, as it wouldn't affect wasm code that doesn't make use of the new unmap-memory facility in the least). I don't think it's a good idea, though. |
@pipcet, I agree it is a reasonable way of emulating OS-level ABIs, but as
you saw, Wasm isn't restricted to that. Other clients might make completely
different use of linear memories, and that's fine, we intentionally don't
prescribe a memory management strategy.
I do agree, FWIW, that we can stay with choice 2 for now (though I
haven't seen an argument in favor of it—it's either nil-nil or nil-one
against), then hastily change to choice 1
Hm, I'm confused now. I was arguing for 1, because relaxing from 1 to 2 is
a non-breaking change that can safely be done at any later point. If we
pick 2 we cannot go to 1 later.
|
Sorry, I meant to say "stay with choice 1" and "hastily change to choice 2". It also wasn't clear to me you're actually in favor of choice 1. Minor nit: technically, of course, moving from choice 1 to choice 2 is a breaking change, because the application can determine whether or not an address is valid by the trick of trying to load a zero-sized data segment just below it. However, it's sufficient to specify no such guarantee is made, I think. |
@pipcet, well, that's a general problem: as soon as you enable any form of reflection or meta programming, there strictly speaking is no longer such a thing as a non-breaking change, because you can always programmatically test for the absence of a feature. So clearly, we need to allow some slack in the definition and say that there is no guarantee that invalid programs stay invalid in the future. EcmaScript does the same, and it's not been a problem. |
@pipcet I would strongly recommend code be written to not use the first wasm page of memory as then it will be more likely that the linear memory can be mapped to absolute zero in the process address space which can give better performance in some cases. I am trying to have a 'first page usable' flag added, it's still pending. |
@Wllang Indeed, I'm not sure whether equating wasm linear memory addresses and process virtual memory addresses really would help performance significantly. Do you have numbers for that? I still have a mild preference for choice 2, but if the specification implies that that change might be made at a later date (without breaking spec-compliant applications), that's nearly as good, and as @rossberg-chromium rightly points out we need to have such language in the spec anyway to clarify that future feature additions do not count as breaking changes to us. |
@pipcet For some test results and analysis of the benefits on placing the linear memory at absolute zero in the process address space see #334 System kernels often restrict use of the low pages so to take advantage of this practically requires the wasm code to avoid using the first page. Obviously it only works for one linear memory per process, but I can see it being used on a mobile device playing a monolithic game or other app where the performance and/or power savings might be compelling. Even if the major web browsers do not adopt this it creates an opportunity for the community to have targeted web browsers for running monolithic wasm apps on mobile devices. I also expect the gains in performance and compiled code storage would be compelling if wasm in used in embedded devices. Practically NULL is going to be represented by zero in typical C code translated to wasm, and code that needs to deal with zero pointers will need an extra flag etc. |
This should already be resolved along with #897 (via convergence on WebAssembly/spec#399). |
Right now, I think it's valid to have zero elem_type in an Element section. Should we allow this?
Ditto for Data.
https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#element-section
The text was updated successfully, but these errors were encountered: