-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future of Binaryen in a stack machine world? #663
Comments
Can you give an example of stack-machine code that can not be expressed (with only encoding loss) in a structured AST? There might not be any such examples, given that dropping stack elements is currently constrained to block boundaries. An explicit But this can not express multiple uses of the function results, and does not work for functions returning multiple values in general. A general solution is needed, and I appeal to people to solve the general case first then optimize it for high frequency cases. The example could also be presented as the following:
And when people code references to these local constants that do not fit the stack usage pattern they would simply be encoded with The other option is the 'Stop consuming WebAssembly' option but I would look at it from a different perspective and view it as conceding that the wasm binary code is just a compressed encoding of the language that developers will actually work with and that the language that binaryen consumed would be the wasm text format and debugging would step through that code not the stack code which would be a throwaway intermediate encoding, just as debuggers do not step through compressed encodings. This would focus on the SSA decoded format, which is really what this is all about, and it might be better for analysis and transforms, but distant from the stack machine code. This would be a last-resort strategy, working around the product of the wasm group, but why should the group be in such a position, it is our group. |
In the past we've kicked around the idea of defining a flavor of wasm (an IR, if you like) which is not exactly wasm but more natural for many compilers to produce (aka a producer-oriented/subwasm/flat wasm). What you are describing as Binaryen's new non-isomorphic IR sounds a lot like that in some respects. As you have observed, Binaryen would then be a one-way tool (at least the compiler part would be; obviously from the user perspective Binaryen has always been a loosely-related collection of tools; e.g. we could still have I think you are basically correct about the consequenses of moving Binaryen to this style of IR. Presumably it would evolve a bit (since it doesn't have to precisely track the current flavor of wasm we could rethink pain points that exist or as we find them), and libbinaryen (C API, optimizers, etc) would track this evolution. Practically speaking, we may choose over some near-term period to keep on trucking and paper over the differences while we work out what we want the end product to be. |
@JSStats: what you're suggesting sounds more like for the WebAssembly design than for Binaryen? This issue is just for Binaryen responding to a WebAssembly change. @dschuff: I agree that trucking on for a while sounds reasonable, but it's not quite that easy. I've been working towards that in the (increasingly misnamed ;) since it's all about stack now) |
That makes sense; If we do want to just go toward a new IR, we do have sexpr-wasm-prototype which can take the role of binary encoding, decoding, and standalone interpretation. Then the IR can be designed for ease of production from whatever class of compilers we care about, and for wasm-specific optimizations. |
Yeah, sexpr-wasm-prototype is perfect for that, it has a proper stack-based interpreter in fact. And I think I saw @binji already has a web port so it could be used as a clientside polyfill (if not I'd be happy to help with that). I think in the forked-IR path, if we go that route, binaryen might still end up with some wasm import facilities, but they might not be lossless. For example a "wasm2byn" could handle patterns that can't work in an AST by introducing new locals (sort of like the project whose name escapes me that translates wasm into LLVM IR in the sense that it's also not meant to be lossless). And of course we can emit wasm, so optimizing wasm to wasm in binaryen might still be useful, although that's an open question. |
I am leaning more in the forked-IR direction. I considered in more detail the option of rewriting binaryen to use a single stack-based IR, but I just don't think it makes sense. The original design principles still relevant are
But doing those with a stack machine IR seems very hard. On the one hand it's true that some optimizations are easy to write (e.g. removing dead code) while others become possible (stack-machine specific things). But on the other, consider a simple pass that reduces
|
@kripken Do you know what the decision-making process is for WebAssembly? I'm increasingly getting the feeling that major decisions are made in private discussions behind closed doors, or other unknown places, with little important discussion happening in the design repo. Did the change to a stack machine catch you by surprise as well? Based on what I know so far (which is no doubt less than what you know), "Keep on trucking, for now" looks like the best option (for now), because
Are "virtual" nodes really uncomfortable in terms of code complexity, or it is mainly uncomfortable psychologically? If it's the latter, I think you'll get used to it after awhile. |
If "forked-IR" means "an AST-based IR that is very similar to, and easily convertible to, the Wasm stack machine, such that arbitrary Wasm stack-machine code can be converted easily to the IR," it sounds like a good plan to me. It's even better if Wasm-to-IR-to-Wasm and/or IR-to-Wasm-to-IR are (guaranteed to be) lossless roundtrips. |
@qwertie: Yes, I was taken by surprise by the stack machine change. And from what I've seen online and off I was the only browser-vendor person that opposed it. And you can see work going on for it (e.g. in the spec repo). So for better or for worse, I think it's safe to assume it's happening, even if nothing is truly final until it lands, is specced, and ships.
Yes, a "first" can solve the current issues, with some awkwardness, but as discussed above multi-values are almost certain to happen, and "pick" and others seem very possible. It's clear the awkwardness for an AST will increase.
I agree and those are some of the reasons I opposed WebAssembly moving to a stack machine. (My main reason is the negative implications for the text format.)
I agree, and I think we might still support a form of that, but it wouldn't be lossless. For example, a
I think there is a fundamental issue here. The optimizer currently has some useful principles like
Now, there are exceptions to those principles. For example, a The problem with "first" is even trickier in that all three of those principles are no longer true: adding such nodes can reduce output code size and work, but the opposite is true in other cases. So we can't just not do the inverse, and it isn't trivial what to do when optimizing for size, and so all optimization passes that remove or create "first"s must be coordinated. Now, of course this isn't an insurmountable problem. But it (and multi-values, and pick, etc.) add complexity and awkwardness to our optimization IR, and not just in a few places. An AST is just not a good 1-to-1 IR for such stack machine code.
No, sorry, the name might be confusing. Forking here means that we diverge from WebAssembly by not adopting the stack machine elements that are awkward in our AST. A better name might be "only output wasm", as it implies wasm is only our output format, while the forked IR is our input format, internal IR, binary and text format, etc. But the direction of Binaryen IR => WebAssembly would remain a very easy and fast operation, of course (since we would be pretty much a subset of WebAssembly). |
OIC - "~" means "varies proportionally with". Aka x α y... It seems to me you can model most of this with a per-node weight - say, most operations have weight 1, multiplications have weight 2-ish, divisions have weight 10-ish... and you can define three weights, one for "optimize for speed", one for "optimize for size" and one for "balanced". If an algorithm needs positive weights, that's okay: it might be better to give I see that multi-values are likely to happen as currently envisioned because they seem easy for the back-end to support. But local variables do everything we would use FWIW I support your opposition to the stack machine, or at least I think there should be a closely-related AST variant of Wasm for the text format and for binaryen. I recognize the (small) value of the stack machine in the final back-end, but still wonder if there's some way to achieve a similar benefit in the AST paradigm (I'm cursing myself for not having found the time to learn Wasm better, otherwise I'd love to investigate this.) It sounds like you weren't really consulted about this decision, but do you know what the decision-making process is? |
@kripken A sketch of a single pass stack-machine to text format algorithm is at WebAssembly/design#753 This does not use a |
Yeah, sorry, I should have been more clear.
Yes, but the issue is that "first" would need to have a negative weight in the context of code size. That's already troubling, but in addition, even that isn't true as the weight depends on the surrounding code, it shouldn't be negative in other cases, so it doesn't even have an intrinsically definable weight.
It might be a while, but I hope it'll happen at least for the text format: with
I don't know that there is anything formal written up. For the informal stuff I'm not sure since I'm not very good at politics ;) |
@kripken I'm not seeing it. What's an example where |
Imagine that we want to do a call, set it to a local, a store, and then use the call result (and the store and the call can't be reordered):
vs.
Both of those have 5 AST nodes, but the former has one less node in the stack machine output since it doesn't need the |
A negative weight isn't needed here: if |
@kripken If you introduce |
@qwertie: sure, but it can't be zero in other cases, and a specific non-zero value might work in some but not others. The underlying issue is that in a stack machine, it's easy to see the cost of this operator (or rather, what it does, since it doesn't exist there). In an AST an analysis is needed to calculate the cost. |
Agree with @JSStats that |
At this stage I'd like to propose that we go in the forked-IR, aka "only output wasm" direction:
(Assuming we agree on this proposal, I am of course signing up to do all the above work; but if anyone wants to help that's great.) Possible future steps, worth mentioning now since they might influence the discussion:
Thoughts? |
@kripken A fundamental problem with this plan is that it still does not address multiple values. Development needs to more forward, develop a counter proposal that addresses the issues, rather than just resisting development and shuffling the chairs. People care about wasm because it is a deployment platform. I think the general point you are making about the disconnect between the language that producers want to easily emit and the twisted stack machine constraints is a key point. The same was seen with the x87, a numerical stack machine, where compilers just wanted to connect inputs and output but had to work around frustrating stack machine constraints, and to optimize for these constraints, and and to be able to annotate where values were for debugging. The If wasm is just a target, and not optimized for code size, then the expressionless register based code is much simpler for producers. |
@JSStats: I agree we need a plan for multi-values, and if we don't have a good one then we can't move forward here. One option is to just not add them to our AST. This assumes we can still generate good code without it. I don't know if that's true. Another is to add it to our AST with something like
i.e. |
@kripken The lexical constants, the The other option is to depend on local variables, and this takes wasm down the expressionless encoding path. The encoding efficiency was not as good, and the live range of definitions is not as well defined, but it would be a simple compiler target. A language with more general multiple values support was explored in WebAssembly/wabt#66 and the comments detail the design and operations. |
On 8 August 2016 at 23:14, Alon Zakai notifications@github.com wrote:
The superficial history is that most of it "happened" in ongoing That said, I can definitely see the downsides. It is less structured, after |
@kripken I'm a bit confused as to why we need to formalize and have a binary format (and e.g. and full-blown interpreter) for this IR. I can buy having a text format (which is useful for writing tests). But the following things make me nervous: It sounds like you are conflating 2 different things with this proposal: A compiler IR meant for optimizations, and machine format/VM that you want compilers to target. Do we want to try to make this IR stable? (my opinion: no) Do we want to write a linker for this format? (my opinion: no). Do we want to define ABIs in this format? (probably not). A wise soul once cautioned the LLVM community about doing this and even though LLVM IR might be an extreme example compared to wasm, the underlying lesson is that that the goals of an optimization IR are different from a platform (and here especially I mean "platform" in a broader sense than just a virtual machine, e.g. ecosystem, interchange format, etc). For example, we know that it's likely that an optimizing compiler might want to use a CFG and not do its own control-flow restructuring, so Binaryen's library includes a relooper API. This is fine if we convert it to a structured IR before optimizing. But we wouldn't want to require that, so what do we do if we force everything through a binary format that matches the IR? Making a binary format that supports both seems like a bad idea because even aside from the complexity, it's incompatible with the stated goal of keeping the IR close to wasm. Another way to say it is, the goal of keeping the IR as close to wasm as possible may be opposed to the goal of making it useful for optimizations, and making more like a "platform" (binary format,e tc) is opposed to the goal of making it flexible for different use cases. Hopefully that wasn't too ramble-y. |
I definitely don't want to define a new VM or machine format or platform here. The only goal I have in mind is a library that makes compiling to wasm easy, fast, and emits good code. Period. And I believe an interpreter serves that goal very well:
And we already have a fully-functional interpreter now, so it makes sense to keep it. Regarding a binary format: it's useful for the same reasons LLVM has a binary format, it's convenient and fast (in particular it will improve our compile times, unless we write all-in-one tools instead of chaining independent ones). And as with an interpreter, we already have one. Linking: I don't feel strongly about this one. On the one hand it could be useful (like Stability: I agree, not a goal. (While in theory it could help compilers targeting us, the burden on us would be far too great.) CFG support: You're right that adding this to our IR formats diverges us more from wasm. But:
I'm not happy about diverging from wasm, and would still love to find a way to avoid it. But it does have upsides, like being able to make small but useful additions to our IR formats such as CFG support. |
Another reason for Binaryen to have its own file format (and not keep using wasm) is that wasm's stack machine pivot causes some issues with unreachable code, namely that adding or removing a branch that alters code reachability can turn wasm code from valid to invalid and vice versa. Details are still being discussed last I heard, but that seems like an annoying property in an object file format (especially when bringing up a new compiler). Overall, I think it's fair to say that wasm is focused on what browsers want to consume, while an object file format would have somewhat different objectives in my opinion:
I had originally thought that the toolchain using raw wasm files for object files was an excellent opportunity for wasm - avoiding an extra format simplifies the ecosystem substantially. And that led to the original Binaryen design. But I think that's just not viable any more, wasm's design is evolving in ways that make sense for browsers but have downsides for toolchains. A single format can't fit all I suppose. |
I am building a compiler for a new language, and came to the conclusion that the most important backend I could build would be a wasm backend, given that it has the potential to become the standard way to ship processor-agnostic binaries for server, desktop and mobile, both inside and outside the browser. I just now came back to get an update on the progress of wasm, for the first time in 6 months, and discovered the switch from AST to stack-based architecture. I consider this to be a huge mistake (and I think the binayen folks on this thread are being too diplomatic about how to cope with the upstream change!). There are several reasons why switching to a stack-based architecture is a bad idea: (1) As pointed out in other comments, this is a major step backwards for the transparency of code and ease of learning on the web: stack-based code is nearly impossible to read. (2) Stack-based architectures are massively inefficient at runtime, which wastes battery on mobile, causes UI latency and jank (which incurs a cognitive burden in users), kills baby polar bears etc. etc. https://www.usenix.org/legacy/events/vee05/full_papers/p153-yunhe.pdf As far as I can tell, the reasons for switching to a stack-based architecture was initially the trigger of allowing void expressions anywhere, but ended up coming down to the increased ease of building the runtime (since you don't have to build as much compiler backend infrastructure if you're already at a lower level than an AST), as evidenced in the following comment by @rossberg-chromium from earlier in this issue:
It is likely that even more overhead would be incurred between compilers forced to compile from a stack-based IR and an AST-based IR than even the difference between stack-based and register-based IR, since the higher level of an AST-based IR makes so many optimizations simpler. For an example of the types of hoops you have to jump through to safely perform even simple dead-code elimination in a stack-based IR, see this paper: http://set.ee/publications/bytecode07.pdf So many simple optimizations require careful thought, soundness proofs, and new abstractions (such as swapping virtual stack snapshots due to differences in control flow) when reasoning about a stack-based VM, and most peephole optimizations are simply undoing some ineffeciencies imposed on the code by lowering to a stack-based form (e.g. replacing DUP SWAP with DUP), because the representation is so low-level: http://www.rigwit.co.uk/thesis/chap-5.pdf Some other reasons why stack-based representations can be more inefficient include the following (there are many -- the quoted points are taken from this Wikipedia page):
(3) AOT compilation from a stack-based IR is much more complicated than AOT compilation from an AST. This means that some implementers of wasm will simply choose to build an interpreter, rather than an AOT compiler, leading to great inefficiencies. (On the other hand, if an interpreter is what is wanted, building an interpreter for an AST-based IR would not be much more complicated than building an interpreter for a stack-based IR -- so compilation suffers with a stack-based design, but interpretation does not suffer either way.) Basically, stack based IRs / bytecode systems are a terrible idea if you care about either interpreter efficiency, difficulty of writing optimizing AOT compilers, or about code transparency, readability or learnability. It is sad that four guys made this decision, perhaps without considering all aspects of these issues, which will affect both billions of devices and billions of users. This is a major step back for the web. Is there any chance the decision could be revisited? (I see this is a binaryen bug, but at least people are talking about the issues here -- is there a better venue for this discussion?) |
@lukehutch Personally I agree with you - I think it was a mistake for wasm to make this change. However, the specific arguments about perf are mostly incorrect - it's true that executing a stack machine can be very inefficient compared to a register machine etc., but the goal in wasm isn't to actually execute the code that way - VMs would translate it to something efficient first (optimized machine code in most cases). With that out of the way, as I said, I definitely agree that the stack machine change was a bad idea for the rest of your reasons - code transparency, readability, learnability. And more generally, openness in the wide sense. Sadly I don't think there is much of a chance to revisit this. The small group of people driving the wasm design are all in agreement on this matter (as you quoted), and everyone is working hard towards shipping. But, if you want to try, then opening an issue on the wasm design repo would be the right place. |
@kripken Correct, wasm will typically be AOT-compiled down to native code. But most of the points I raised are also true of AOT compiled code, not just interpreted code. From the Wikipedia page on stack machines: "The object code translators for the HP 3000 and Tandem T/16 are another example.(15)(16) They translated stack code sequences into equivalent sequences of RISC code. Minor 'local' optimizations removed much of the overhead of a stack architecture. Spare registers were used to factor out repeated address calculations. The translated code still retained plenty of emulation overhead from the mismatch between original and target machines. Despite that burden, the cycle efficiency of the translated code matched the cycle efficiency of the original stack code. And when the source code was recompiled directly to the register machine via optimizing compilers, the efficiency doubled. This shows that the stack architecture and its non-optimizing compilers were wasting over half of the power of the underlying hardware." Refs: (15) HP3000 Emulation on HP Precision Architecture Computers, Arndt Bergh, Keith Keilman, Daniel Magenheimer, and James Miller, Hewlett-Packard Journal, Dec 1987, p87-89. (PDF) (16) Migrating a CISC Computer Family onto RISC via Object Code Translation, K. Andrews and D. Sand, Proceedings of ASPLOS-V, October, 1992 Both of these papers are old, but the point is still true today that optimizing stack-based IR code is exceptionally difficult. Stack-based IRs make the code less efficient, and most of the difficult work of building an optimizing compiler is just using local optimizations to undo the inefficiency of the stack machine. Once you undo that efficiency, you're left with code that is borderline obfuscated, difficult to optimize further, and much less efficient than code that has not passed through a stack IR stage. Thanks on the pointer to the wasm design repo, I'll post there. |
Yeah, it's true that
But in wasm we are talking about optimizing compilers, and specifically ones using SSA IRs. The wasm stack machine code is translated into the VM's SSA IR, at which point it doesn't matter if it came from a stack machine representation or register machine representation or AST or anything else - it's a bunch of operations on virtual registers in basic blocks in a CFG. Then it can be optimized exactly like the best native compilers do today. We also have perf numbers on wasm running in current VMs - it's very close to native speed. asm.js was already close, and wasm improves on that. So throughput is just not an issue here. |
@lukehutch It's not so bad, sleep on it and take another look. There is not really a 'stack machine' rather wasm is designed to decode to validated SSA form quickly, and with a few extensions code could be encoded in SSA form, so the 'stack machine' could be viewed as just an efficient encoding. The stack encoding is frustrating to work with, so just don't, it's not intended to be an IR, it's an encoding, just decode it to SSA form and work with that, and wasm is designed to make this easy to work with untrusted code by being able to validate and decode it to SSA in a single pass - that is good isn't it? Even the local variables are just place holders for holding definitions while decoding and are expected to be gone at the end of the decoding stage. The design is not even complete, we don't have high performance decode/compile implementations yet and I suspect that some more changes will be necessary to support this. Wasm is optimized to compile down to machine code AOT, before it runs anyway, not incrementally compiled as needed and not optimize for an interpreter which is still possible. |
@Wllang then why not encode directly in SSA form? Adding the stack-based form only obfuscates the problem. That doesn't make any sense, there is no value added by the stack-based representation other than the intellectual curiosity of how to optimally solve the new problems that a stack machine introduces when you're trying to convert to a more useful form. |
The problem is that SSA form is not compact - it's much larger than an AST or stack-based representation. |
@lukehutch I am trying to do just that, made some recent progress, but I don't have a conclusion on the encoding efficiency just yet. It is possible that @kripken is right, that the local variables will prove to be an efficient encoding and SSA 'much larger', but I am a long way from conceding just yet and if I do I'll be able to explain it. There are certain many functions that encode well in SSA form, but also see data flow patterns that are a challenge and I need to work through them. |
@lukehutch Here's a quick example of where I'm at with it, the
|
@Wllang thanks for the example. What is this exactly? Is this some SSA code converted from the lowlevel stack representation, rendered into s-expression form, or is this a proposed alternative to the current stack machine model? |
@lukehutch It's binaryen output decoded to SSA retaining the control structure. There are already provisions in wasm to do encode much of this and it's a minor variation of the encoding:
Here's the above translate roughly to C, the C compiler cleans it up. Obviously a linear memory scheme needs to be chosen too, but for trusted code aligned to the target memory it would be as simple as follows. Have wasm code compiled in this way demonstrated running on IoT devices. Add pointer masking and good type derivation to minimise it and the code has a new layer of security and relatively good performance even without memory management. Add C type annotations and it can interoperate with C libraries cleanly. It's not valid C code due to all the casting, needs some compiler flags, perhaps there is a better approach, need to revisit.
|
@Wllang Thanks for explaining. The thing I don't understand is: why encode in stack form at all, if all the consumers of wasm are going to have to undo the stack representation (as you have) in order to do anything useful? (@kripken I don't think code size is the full answer here -- there must be a non-SSA form that is also non-stack-based, and much easier to work with than the current stack-based system.) |
@lukehutch The efficiency comes from the common case of definitions that have a single use and are consumed in a stack like order, or in other words code that can be represented as an expression and encoded pre-order or post-order. Serial encodings are just not a good representation for some 'useful things' to do with the code, pre-order or post-order. |
@lukehutch, the most compact representation for expressions is a pre- or post-order serialisation. The latter turns out to be slightly more efficient to process on the consumer side. A stack machine essentially is just the generalisation of such a post-order encoding. As such, it is pretty much optimal. In particular, it keeps all "register indices" entirely implicit. |
Of course, what's optimal for expressions may not be optimal in general. However, single-use values are very common, so the stack machine encoding is quite effective. |
@Wllang @rossberg-chromium are you saying that for this particular stack machine, there is a guaranteed bijective mapping between the stack representation and the AST, and the stack representation is only used as an efficient serialization mechanism for the AST? If the stack representation is only used to serialize the AST, and it is neither intended to be interpreted nor to be displayed in stack-based form in browser debugging tools, then it makes sense that this is a non-issue. (It is not true for stack machines in general that the stack representation can be unambiguously translated into an AST, depending on what operations are supported, and what the semantics of those operations are. For example, all deserialization bets are off if the stack machine supports a goto instruction that is able to jump to any instruction, or if it supports operations that can push a varying number of parameters onto the stack, depending on an input value.) |
@lukehutch I gave up long ago on seeking an encoding with a one-to-one mapping to a useful presentation language, and it took me too long to realize that it's not necessary as you can define a canonical decoding that should be sufficient, sorry to people on that one. So consider the wasm binary as having many possible encodings for the same canonical language, and variations on that canonical language can be encoded in an unknown section. Perhaps you'll come to the same realisation at some point, but it seems healthy to explore and challenge. Practically the stack code is also interpreted and is even the default view-source format it seems. I would like to see some web browser hooks to allow custom presentation formats for view-source so the community can explore a range of text formats, and perhaps it will be possible some day. The wasm code has structured control flow, which relates back to the validation. There are no operations that push a dynamic number of values and the stack depth is always uniquely defined (ignoring some differences in unreachable code). |
@lukehutch, there is an injective mapping from Wasm expressions into the
Wasm stack machine. The inverse may require the introduction of auxiliary
operators akin to C's comma operator in cases that are not in the image of
that embedding. But nothing more. In particular, general goto does not
exist in Wasm; unlike other code formats, control is still structured in
well-nested form.
For debugging, browsers for now will show the plain stack machine format.
That actually turns out to be convenient in some ways, because you can step
through it linearly. More bells and whistles with smart "folding" of
suitable instruction sequences into expression-like output are expected to
follow later.
|
@rossberg-chromium, that's the current state now, but if as expected wasm ends up adding stacky operations like pick, multiple return values, etc., then the situation for the inverse will worsen, won't it? Or has something changed? |
@kripken, true, then you need auxiliary let's in general -- one reason why I'm not particularly fond of pick or other stack hacking ops, and would rather prefer a destructuring |
If the stack code is interpreted directly as a stack machine only when single-stepping in the debugger, that's fine (and, presumably, the debugger would also be free to single-step through an AST-ified representation of the stack format if it wanted to give the user the option of stepping at a higher level). However, if the fundamental stack-based execution model of wasm is preserved even in the AOT-compiled code, meaning that there is an actual stack machine (beyond the call stack) used to store all intermediate values even in compiled code, then there are enormous efficiency implications. See the links I provided in my initial comment for quantitative evidence and qualitative explanations for how badly the stack machine model can impact performance (and therefore battery life, etc.). So I guess this is at the core of my concern: what will AOT compilers do with wasm stack machine code, and how will it be run outside of the debugger? |
As mentioned above, nothing to worry about: the stack machine representation is converted into standard compiler SSA form and optimized, just like clang or gcc would, into machine code. That is how SpiderMonkey and V8 work right now, so you can benchmark this if you want, no need to speculate. For example, here's emscripten's corrections benchmark:
That shows asm.js being 8% slower than native clang, and wasm is slightly faster, at 5% slower. |
@kripken thanks for confirming this. What is the reason for the remaining 5% overhead? Does it come down to reduced opportunities for optimization in wasm SSA form code? Or overhead of sandboxing? |
Could be those, but it could also be noise - gcc and clang have differences between them too, often much larger. Sometimes one register allocator happens to work better on a particular function, etc. |
The WebAssembly project recently decided to switch from being an AST to being a stack machine. This issue is to discuss the implications of that for the Binaryen project, which is AST-based.
History: Binaryen's initial design and goals were simple: WebAssembly was an AST. By building a compiler that uses that AST as its internal IR, we could stay very close to our output format, which allows
asm2wasm
andmir2wasm
projects are examples of this.But the foundations for all that are now in question with WebAssembly's pivot to a stack machine.
The first practical consequence is that things in the WebAssembly MVP will not be expressible as an AST, e.g.
This stack machine code does two calls to get i32s, then a void call. The void call is not placed on the stack, and the add adds the i32s. In an AST that order of operations is not directly expressible, although a "first" or "reverse-comma" operator can work,
<x, y>
wherex
is done, theny
, and thenx
is returned (which is the opposite of the comma operator in C, JS, etc.). So we can writeThis technically works, but is awkward. In particular, the "first" operator vanishes when actually emitting WebAssembly. This puts us in the uncomfortable position of optimizations needing to take into account that more AST nodes might mean less code size. And the obvious simple IR for such optimizations is just the linear list of stack machine instructions.
That's just the beginning: A major justification for the stack machine approach is multi-values, which are not in the MVP but will be added later. As with "first", technically one could invent some AST construct, but it's awkward. Further possible future features already showing up in discussions include
pick
(copy a value from up the stack) and other stack machine tricks. It's clear that WebAssembly is moving in a direction that Binaryen's initial design is just not a good fit for.The bottom line is that Binaryen was designed based on what WebAssembly was at the time. It seemed like an elegant approach to use the WebAssembly AST as a compiler IR. That tied Binaryen's internals to the WebAssembly design, which was a risk, but it paid off - until now, as WebAssembly has pivoted. So we need to decide what to do.
Options include:
More details on the stop-consuming-WebAssembly option:
wasm-shell
would run those tests, and should be renamed tobinaryen-shell
(orbyn-shell
?).wasm-opt
tool would be renamedbinaryen-opt
(orbyn-opt
?) as it would operate on Binaryen IR, not WebAssembly. Binaryen would stop being a wasm-to-wasm optimizer, which was a goal we thought could be useful for other projects - we would be dropping that goal.wasm-as
,wasm-dis
tools would likewise be renamed as they would operate on a Binaryen binary format. In other words, as with the s-expression format, this would be a fork, of the binary format. Note also that this means that Binaryen would no longer be a tool people can use to disassemble and assemble wasm files for toolchain purposes.asm2wasm
andmir2wasm
, as Binaryen's IR and C API are not changing. There should be no downside whatsoever for those compilers, and in particular not for the entire emscripten-asm2wasm path for compiling C++ to WebAssembly, which already works now.s2wasm
is tricky since its input is basically WebAssembly. We would need to modify or replace it, in tandem with the wasm backend. Several options here, with varying amounts of work.wasm.js
orbinaryen.js
, which could execute WebAssembly in JS for polyfilling purposes. You might say thatwasm.js
fulfilled its purpose of helping the toolchain side before JS engines got proper wasm support, and at this stage, we don't need it as much. So that is probably not a big deal. Forbinaryen.js
, it could no longer work as a (slow) client-side polyfill for WebAssembly, but other interpreters could be compiled instead.As you can see from the amount of text, the stop-consuming-WebAssembly option is the one I've thought most about. Not because I like it necessarily, but because all the other options have downsides that worry me a lot more. But I have no definite opinion on any of this yet, hoping to hear what other people think.
Thoughts?
The text was updated successfully, but these errors were encountered: