Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support int8 and int16? #85

Closed
jfbastien opened this issue May 27, 2015 · 17 comments
Closed

Support int8 and int16? #85

jfbastien opened this issue May 27, 2015 · 17 comments

Comments

@jfbastien
Copy link
Member

Should Web Assembly explicitly support int8 and int16?

We've discussed this:

A few thoughts:

  • I think supporting these types makes it easier to have a dumb and fast compiler.
  • It is more instructions to know about in the assembler, but less work to figure out patterns in the instruction selector.
  • It adds complexity to a register allocator if we try to treat types differently.
  • Opens the door for more advanced optimizations (value / bit tracking, or even vectorization).

We can also add these types to a later version of Web Assembly if just int32 / int64 prove insufficient.

@kg
Copy link
Contributor

kg commented May 27, 2015

I think we should support all the numeric types, but it's fine to promote them internally to machine int sized registers. It will enable packing in cases where you want to pack, and more importantly if we introduce typed objects or equivalent, at that point those types will enter our domain anyway.

This also eliminates the need to patternmatch a bunch of different things.

Java's decision to not have unsigned types was a huge mistake. We shouldn't repeat anything of the sort by leaving out key, frequently-used primitives (like bytes).

@kripken
Copy link
Member

kripken commented May 27, 2015

I agree, makes sense to add these. It does make the polyfill to JS a little more complex, but it makes wasm more of a "normal" platform.

@pizlonator
Copy link
Contributor

On May 27, 2015, at 11:28 AM, K. Gadd notifications@github.com wrote:

I think we should support all the numeric types, but it's fine to promote them internally to machine int sized registers. It will enable packing in cases where you want to pack, and more importantly if we introduce typed objects or equivalent, at that point those types will enter our domain anyway.

This also eliminates the need to patternmatch a bunch of different things.

Java's decision to not have unsigned types was a huge mistake.

I disagree. It may have been a mistake at the source level but it doesn’t make that much of a difference at the bytecode level. Also, there is a big difference between leaving out signedness in types and leaving out unsigned math operations (mostly, that means having udiv/urem variants div/rem).
We shouldn't repeat anything of the sort by leaving out key, frequently-used primitives (like bytes).

Even if the omission of unsigned was a bad thing, I don’t see how that’s related to the omission of bytes. That’s a different issue. Do you have a specific reason for wanting a byte type?

-Filip

@pizlonator
Copy link
Contributor

On May 27, 2015, at 2:07 PM, Alon Zakai notifications@github.com wrote:

I agree, makes sense to add these. It does make the polyfill to JS a little more complex, but it makes wasm more of a "normal" platform.

Normal platforms often exclude, or punish, sub-32-bit arithmetic. I’ve been hearing from the LLVM crew that it would be a good idea to avoid emitting 16-bit math on Intel because it’s actually worse than legalizing up to 32-bit. I don’t remember seeing 8-bit/16-bit int math instructions on ARM (but I could be wrong).

-Filip

@pizlonator
Copy link
Contributor

On May 27, 2015, at 11:22 AM, JF Bastien notifications@github.com wrote:

Should Web Assembly explicitly support int8 and int16?

We've discussed this:

In issue #82 WebAssembly/spec#82 w.r.t. load/store and implicit truncation / extension.
In issue #81 WebAssembly/spec#81 because having int64 makes int8/int16 look more regular / less out of place.
A few thoughts:

I think supporting these types makes it easier to have a dumb and fast compiler.
Do you mean, it is easier to target wasm, or easier to compile wasm to machine code?

I think you can compile wasm just fine if it only includes 32-bit/64-bit int math. In fact, it would be easier. Then it would be up to the wasm generator to legalize small-int math up to 32-bit math, and to do so without any obvious badness.

It is more instructions to know about in the assembler, but less work to figure out patterns in the instruction selector.
I mostly agree with both points. But having support for some of the basic patterns - like 8-bit store of a 32-bit result on 8-bit loads - is not much harder than implementing mandatory wasm-level 8-bit math operations. I’m not sure making these explicit really saves effort, though it might be more efficient in the limit, on those platforms where 8-bit/16-bit math is natively fast. But I’m not even convinced that this is true. :-) I vaguely recall that the last time I wrote a backend for a JVM, it wasn’t hard to get byte math to be fast enough.
It adds complexity to a register allocator if we try to treat types differently.
Opens the door for more advanced optimizations (value / bit tracking, or even vectorization).
Can you elaborate? I don’t know what you mean by value/bit tracking. As for vectorization, it seems that having a vector of int8’s doesn’t require a separate int8 type, much like being able to load/store bytes doesn’t require such a type.
We can also add these types to a later version of Web Assembly if just int32 / int64 prove insufficient.

Sure. I’m going to try to take a hard line against int8/int16 - but if there is a good argument for these, then maybe we can get it right from the start and include them.

-Filip

@MikeHolman
Copy link
Member

I agree with @pizlonator. I think it would be easier to exclude int8/int16, and I don't really see the benefit. From wasm->machine code side of things it would certainly be harder for us if we had them. Our backend already has good support for int32s, and adding int8/int16 would certainly require additional work to reach the same code quality. Of course this is a JS engine centric view, so I'm willing to concede that isn't a great reason against them in a general sense, but it certainly isn't a reason for including them.

I'd like to see something a little more compelling before we commit to adding int8/int16.

@titzer
Copy link

titzer commented May 28, 2015

On Wed, May 27, 2015 at 8:28 PM, K. Gadd notifications@github.com wrote:

I think we should support all the numeric types, but it's fine to promote
them internally to machine int sized registers. It will enable packing in
cases where you want to pack, and more importantly if we introduce typed
objects or equivalent, at that point those types will enter our domain
anyway.

This also eliminates the need to patternmatch a bunch of different things.

Java's decision to not have unsigned types was a huge mistake. We
shouldn't repeat anything of the sort by leaving out key, frequently-used
primitives (like bytes).

Bytes are not left out; they're memory types. When you load from memory
they get sign-extended or zero-extended and when you store to memory the
value stored is truncated.

And unsigned arithmetic and comparisons are supported under the current
proposal, they just work with 32-bit integers.

Reply to this email directly or view it on GitHub
WebAssembly/spec#85 (comment).

@titzer
Copy link

titzer commented May 28, 2015

On Thu, May 28, 2015 at 7:33 AM, Michael Holman notifications@github.com
wrote:

I agree with @pizlonator https://github.com/pizlonator. I think it
would be easier to exclude int8/int16, and I don't really see the benefit.
From wasm->machine code side of things it would certainly be harder for us
if we had them. Our backend already has good support for int32s, and adding
int8/int16 would certainly require additional work to reach the same code
quality. Of course this is a JS engine centric view, so I'm willing to
concede that isn't a great reason against them in a general sense, but it
certainly isn't a reason for including them.

I'd like to see something a little more compelling before we commit to
adding int8/int16.

This is basically my opinion as well.

For background, I did a lot of work for Virgil to support arbitrary
fixed-bit-width integers up to size 64. Part of that was finding the right
source language rules for promotion and part of that is an efficient
implementation strategy. It turns out that it's not so hard to do
subword-sized math on a wider machine; you just have to introduce explicit
truncations (i.e. either sign or extend the result) after most fullword
operations. A few operations like arithmetic shift right, shift right on
non-negative inputs, and, or, and xor, don't need truncations of the output
if the inputs are already truncated. Comparisons don't need truncations on
the inputs if all values are properly sign- or zero-extended. And there are
several analyses and optimizations that compilers can perform on top of
that to remove redundant truncations, such as those where the upper bits
wouldn't ever be observed by the program.

I view 8- and 16-bit math as special cases of the above, and WebAssembly as
a machine that has 32-bit wide math with soon-to-be-added 64-bit wide math.
Then in the rare cases that a program wants sub-word math, I think the
program should be responsible for the extra truncations to work within the
wider-width WebAssembly machine. That does move some of the burden to the
producer for the benefit of a simpler engine, but I think it's warranted.
Not all architectures that wasm will run on have sub-word math, resulting
in all that sign-extend and zero-extend ugliness in the engine. Any
analysis to remove redundant truncations based on data flow belongs in the
source compiler (i.e. WebAssembly producer), IMO.


Reply to this email directly or view it on GitHub
WebAssembly/spec#85 (comment).

@kg
Copy link
Contributor

kg commented May 28, 2015

Wait, are we talking about N-bit math or N-bit data types here? The original issue text did not suggest we were talking about introducing things like a 16-bit signed add or a 16-bit unsigned add.

Even if the omission of unsigned was a bad thing, I don’t see how that’s related to the omission of bytes. That’s a different issue. Do you have a specific reason for wanting a byte type?

I am profoundly confused by this question, sorry. Bytes are an extremely common unsigned 8-bit integer type that's used in all sorts of applications. You can store your bytes into int32s, of course, but the actual memory operations are 8bit and you're truncating to 8bit. Is it implied/assumed that webasm already solves all these scenarios in a different way? I don't remember seeing this addressed in the draft specs here when I last read them, but maybe I overlooked something.

Bytes are not left out; they're memory types. When you load from memory they get sign-extended or zero-extended and when you store to memory

I'm not clear on what this means or how it interacts with the question of whether things like an int16 type exist. As I understand it here, the point of say, an int16 type is that it makes things like truncation, sign extension and load/store clearer, and in the future when we inevitably support known object layouts (typed objects) we will have those types enter our type system anyway.

They also matter when interfacing with the outside world - the type of a[i] for a = new Uint8Array(8) is almost certainly Uint8, even if we decide to silently promote it to a machine int (32 or 64 bit). Once typed objects are in they will eventually become the preferred way for normal JS code to interact with structures inside an asm.js-style heap or webasm heap or shared buffer.

What is a 'memory type'? Do we have a separate set of types and type hierarchy for 'memory types' that is different from the types of values that exist in other contexts? Do we have a set of rules that govern how memory types interact with non-memory types? I am having a really hard time comprehending how this would work.

Promoting to a larger size for arithmetic seems like the obvious choice here - isn't that typically what happens in native runtime environments these days to begin with?

For real world applications you're going to see a bunch of non-32bit sized loads/stores, truncations, etc. Maybe less in certain types of applications. If we have to patternmatch to do the 'right thing' with those operations, and end up introducing the types later on anyway, that'll be really regrettable.

@kg
Copy link
Contributor

kg commented May 28, 2015

Instead of editing in, I'm just going to follow up to say that I grepped for 'memory types' and found a relevant section that I hadn't seen before. I still find this partitioning confusing but it's at least clearly described.

Given the presence of the full set of types in the 'memory types' list I'm actually profoundly confused about what this issue is actually addressing. Are we talking about adding arithmetic variants that operate natively on each type, adding them to the supported set of local types but doing arithmetic with implicit promotion (i.e. C#, C++, etc), or something else?

@titzer
Copy link

titzer commented May 28, 2015

On Thu, May 28, 2015 at 2:43 PM, K. Gadd notifications@github.com wrote:

Instead of editing in, I'm just going to follow up to say that I grepped
for 'memory types' and found a relevant section that I hadn't seen before.
I still find this partitioning confusing but it's at least clearly
described.

Given the presence of the full set of types in the 'memory types' list I'm
actually profoundly confused about what this issue is actually addressing.
Are we talking about adding arithmetic variants that operate natively on
each type, adding them to the supported set of local types but doing
arithmetic with implicit promotion (i.e. C#, C++, etc), or something else?

Yes, I thought we were discussing whether adding 8-bit and 16-bit
arithmetic operations would be warranted, and I am arguing that they aren't
necessary if loading from memory sign- or zero- extends and storing to
memory truncates. The the division between memory and local types allows
having fewer local types and therefore fewer arithmetic operations.

-B


Reply to this email directly or view it on GitHub
WebAssembly/spec#85 (comment).

@kg
Copy link
Contributor

kg commented May 28, 2015

If every context for values other than locals supports the full set of types, I don't see any major problem with having all arithmetic operate on 32-bit (or 64-bit) operands. I'm not clear on how unsigned would fit into the picture but I remember that already being addressed for 32-bit operands. There are existing environments that don't define arithmetic for types smaller than 32 bits; the one I'm most familiar with promotes all arithmetic to 32-bit int (or a suitable larger type) and expects you to manually truncate or cast when writing a result back into a smaller type.

We should consider whether we want to provide simple truncation intrinsics for each type, since truncation will happen during arithmetic instead of just at the point where a value is stored into the heap. However, that may be relatively uncommon compared to the truncation at heap stores, which we've folded into the stores themselves.

If we do allow structured types in locals eventually that will compromise a lot of this, but we wouldn't be allowing arithmetic on those (they'd be function calls) so that wouldn't have the downsides we want to avoid here, I think.

How does the 'memory type' / local type split affect applications like the typical emscripten output that emulate a stack by writing into a reserved section of the heap? Are we fine with how all of that works out here? In that case we basically have pseudo-locals that are of memory types, but they constantly get promoted and truncated when being moved onto/off of the stack. That seems like it could get really gross.

@lukewagner
Copy link
Member

Agreed with @MikeHolman, @pizlonator and @titzer on not wanting int8/int16 as local types w/ all their own arithmetic; just the memory types (ideally w/ built-in sign-/zero-extension/truncation, in my view) seem sufficient.

@kripken
Copy link
Member

kripken commented May 28, 2015

I wonder if there is a sense in LLVM that it should not have i8, i16 types? It seems like if they don't make sense for wasm, they don't for LLVM either, so the comparison confuses me. Unless LLVM still cares about some 16-bit platforms, while we don't?

@jfbastien
Copy link
Member Author

On Thu, May 28, 2015 at 11:02 AM Alon Zakai notifications@github.com
wrote:

I wonder if there a sense in LLVM that it should not have i8, i16 types?
It seems like if they don't make sense for wasm, they don't for LLVM
either, so the comparison confuses me. Unless LLVM still cares about some
16-bit platforms, while we don't?

LLVM supports integers of any fixed size (i1, i2, i3, i4, i1337, ...).
Those are used in a bunch of different places, but the most common is
through SROA. I'm not sure it's always very useful.

@sunfishcode
Copy link
Member

Exotic types like i3 and i1337 have uses for high-to-mid-level optimization (e.g. LLVM's optimizer), because they preserve a little more value range information. However, they must be lowered for codegen (e.g. WebAssembly) on any common hardware.

Types like i8 and i16 are less exotic, however many platforms don't have i8 or i16 arithmetic at all, and even on x86 which does have them, i16 arithmetic tends to be slow anyway, and i8 arithmetic is only occasionally faster than i32 arithmetic. But even then, C's promotion rules mean that all arithmetic is done at "int" width or wider by default, so codegen only ever sees i8 arithmetic when the optimizer has proven it safe to form. And in those cases, lowering i8 values to i32 usually leaves behind enough hints that a reasonably clever code generator can see that only 8 bits are needed and optimize to 8-bit arithmetic anyway.

I think the main arguments for WebAssembly to have i8 and i16 would be that we wouldn't need sign/zero-extending loads and truncating stores, and there'd be a little closer symmetry between SIMD types and scalar types (since SIMD will have 8-bit and 16-bit integer element types). However, since these types aren't actually that useful in scalar, it seems like a greater simplification to just omit them.

@sunfishcode
Copy link
Member

Closing, based on the rationale in the previous comment. If anyone wants to revisit this, we can reopen or open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants