Add opcode name syntax for addressing #374

lukewagner · 2015-09-25T17:48:29Z

In preparation for adding linear memory addressing to ml-proto, we need an opcode name syntax. This PR is mostly symmetric with what was done for alignment in #342, except with a +n suffix.

AndrewScheidecker · 2015-09-25T18:05:56Z

I think this is pretty confusing in combination with the / used for alignment. On its own, the i32.load/2 doesn't evoke arithmetic, but i32.load+4/2 certainly does in a misleading way.

I'd also prefer that this and alignment are optional operands rather than part of the opcode for the reasons mentioned here. e.g. (i32.load (align 4) (offset 16) base)

jfbastien · 2015-09-25T18:58:01Z

Agreed with @AndrewScheidecker. I had a similar conversation with @kg, and we agreed that something like LLVM's bitcode syntax was way easier to read. We can keep s-expressions while being readable.

sunfishcode · 2015-09-25T19:03:32Z

LLVM's bitcode doesn't have a syntax for extending loads, truncating stores, or constant offsets. Could you give an example?

lukewagner · 2015-09-25T19:05:42Z

I'm not really interested in bikeshedding our syntax for our temporary s-expression syntax. I think the current strategy strikes a nice balance in allowing me to quickly write tests that aren't super verbose and I'm not eager to go rewrite everything.

ghost · 2015-09-29T22:11:16Z

+1 to arguments such as an offset being separate from the opcode name, even for this 'throw-away' spec. If there were only a very few values that an option can take then it would not seem to be worth 'bike shedding' but it seems extreme to have separate opcodes for each offset which is effectively what is proposed. Adding the offset to the opcode name fails the test of 'being practical to dispatch on each unique opcode name'.

A related issue is that if the offset is an argument then does it need to be a 'literal' node - is there a concept of atomic numbers as arguments?

If adding atomic numbers as arguments then is there any need for 'literal' nodes that represent numbers, see #380 ?

kg · 2015-09-30T03:51:12Z

We're apparently never going to remove the /n for alignment and we're dead set on the weird _nn encoding of load/store sizes, but that aside, +o for offset is problematic in many ways:

We've doubled the number of unique load and store opcode names
It's unclear where the + goes in the opcode name (before or after the /?)
Is the offset decimal or hex? Appears to be decimal here, but I've seen large offsets in code (struct manipulation, for example) that will get unwieldy in decimal.
How do we express a negative offset? +-n? -n? If the latter, we've introduced another set of load/store opcode names, so * 3.
- If we ever add another class of loads and stores we will suffer for this even more. Nontemporal stores are one possible candidate that comes to mind.
- Once we add more load/store sizes (for simd vectors, or 128 bit integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be 4-byte aligned.

Everything we're shoving into random corners of the opcode name (or assigning new opcode names to) is just a compile-time parameter that specifies the exact nature of the load/store. It's possible that we will want to assign every combination its own globally unique opcode, but that is a binary encoding consideration. At every other stage in the pipeline, doing this adds pointless complexity. Loads and stores are a matrix of possibilities at this point because of the number of parameters, so we should just accept that and encode the matrix sensibly.

jfbastien · 2015-09-30T05:07:14Z

Agreed with @kg that we're entering APL land with the textual representation of opcode.

rossberg · 2015-09-30T12:22:10Z

On 30 September 2015 at 05:51, Katelyn Gadd notifications@github.com
wrote:

We're apparently never going to remove the /n for alignment and we're
dead set on the weird _nn encoding of load/store sizes, but that aside, +o
for offset is problematic in many ways:

We've doubled the number of unique load and store opcodes

It's unclear where the + goes in the opcode name (before or after
the /?)

Is the offset decimal or hex? Appears to be decimal here, but I've
seen large offsets in code (struct manipulation, for example) that will get
unwieldy in decimal.

How do we express a negative offset? +-n? -n? If the latter, we've
introduced another set of load/store opcodes, so * 3.

If we ever add another class of loads and stores we will suffer
for this even more. Nontemporal stores are one possible candidate that
comes to mind.

Once we add more load/store sizes (for simd vectors, or 128 bit
integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed
path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be 4-byte
aligned.

Everything we're shoving into random corners of the opcode name (or
assigning new opcode names to) is just a compile-time parameter that
specifies the exact nature of the load/store. It's possible that we will
want to assign every combination its own globally unique opcode, but that
is a binary encoding consideration. At every other stage in the
pipeline, doing this adds pointless complexity. Loads and stores are a
matrix of possibilities at this point because of the number of parameters,
so we should just accept that and encode the matrix sensibly.

While I agree that the opcode names are getting uncomfortably complicated,
I am even less fond of the above suggestion. Besides being awfully verbose,
it removes the syntactic distinction between what's (part of) an opcode and
what's an operand expression. Just looking at the S-expression structure,
it is no longer apparent what is what, and what syntactic category any part
of an expression is in.

It's a good idea to have a more visible and uniform separation between the
various opcode parameters. But this separation should be syntactically
distinct from the way actual operands are separated. Part of the reason for
inventing the ad-hoc / separator was to make that so. I am not wedded to
that choice in particular, but something along these lines seems highly
preferable.

titzer · 2015-09-30T12:25:49Z

On Wed, Sep 30, 2015 at 2:22 PM, rossberg-chromium <notifications@github.com

wrote:

On 30 September 2015 at 05:51, Katelyn Gadd notifications@github.com
wrote:

We're apparently never going to remove the /n for alignment and we're
dead set on the weird _nn encoding of load/store sizes, but that aside,
+o
for offset is problematic in many ways:

We've doubled the number of unique load and store opcodes

It's unclear where the + goes in the opcode name (before or after
the /?)

Is the offset decimal or hex? Appears to be decimal here, but I've
seen large offsets in code (struct manipulation, for example) that will
get
unwieldy in decimal.

How do we express a negative offset? +-n? -n? If the latter, we've
introduced another set of load/store opcodes, so * 3.

If we ever add another class of loads and stores we will suffer
for this even more. Nontemporal stores are one possible candidate that
comes to mind.

Once we add more load/store sizes (for simd vectors, or 128 bit
integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed
path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be
4-byte
aligned.

Everything we're shoving into random corners of the opcode name (or
assigning new opcode names to) is just a compile-time parameter that
specifies the exact nature of the load/store. It's possible that we will
want to assign every combination its own globally unique opcode, but
that
is a binary encoding consideration. At every other stage in the
pipeline, doing this adds pointless complexity. Loads and stores are a
matrix of possibilities at this point because of the number of
parameters,
so we should just accept that and encode the matrix sensibly.

While I agree that the opcode names are getting uncomfortably complicated,
I am even less fond of the above suggestion. Besides being awfully verbose,
it removes the syntactic distinction between what's (part of) an opcode and
what's an operand expression. Just looking at the S-expression structure,
it is no longer apparent what is what, and what syntactic category any part
of an expression is in.

It's a good idea to have a more visible and uniform separation between the
various opcode parameters. But this separation should be syntactically
distinct from the way actual operands are separated. Part of the reason for
inventing the ad-hoc / separator was to make that so. I am not wedded to
that choice in particular, but something along these lines seems highly
preferable.

In the past I've been fond of the convention that static arguments to
opcodes are enclosed in braces [...] while all dynamic arguments are in
parentheses (...). We use that convention internally in our compilers and
at one point this was also part of the design repository. I recall it was
removed in favor of other lexical tricks but I didn't follow exactly why.

—
Reply to this email directly or view it on GitHub
#374 (comment).

lukewagner · 2015-09-30T14:54:59Z

I agree with @rossberg-chromium that it's useful to have immediates syntactically distinct from general subexpressions. @titzer braces look good to me too, but it seems like they are reserved puncutation in the proper s-expression language which we are ostensibly generating. It looks like = is pseudo-alphabetic (and thus allowed in plain tokens), suggesting:

(i32.load (i32.const 0))
(i32.load align=2 (i32.const 0))
(i32.load offset=4 align=2 (i32.const 0))
(i32.load16_s offset=2 align=1 (i32.const 0))

which looks pretty nice to me and is rather more symmetric with how the immediate of get_local is specified (as a separate token).

sunfishcode · 2015-09-30T16:38:32Z

lgtm

kg · 2015-09-30T22:30:16Z

(i32.load (i32.const 0))
(i32.load align=2 (i32.const 0))
(i32.load offset=4 align=2 (i32.const 0))
(i32.load16_s offset=2 align=1 (i32.const 0))

Looks great and addresses my concerns re offset/align. I'm okay with leaving the memory size and sign suffixes there as long as we address those two. Thanks.

jfbastien · 2015-09-30T22:58:49Z

I like @lukewagner's suggestion on =.

Small questions:

Do we allow any ordering to them (offset=2 align=1 equivalent to align=1 offset=2)? I think that would be nice.
I'm assuming we don't allow duplicates (align=1 align=1)?
Also assuming we don't allow unknown "fields" (hithere=1)?

kg · 2015-09-30T23:08:01Z

I think they should be order-independent, and we shouldn't allow duplicates. Allowing unknown fields seems like it would break our feature-detection principles and would potentially result in an old runtime executing new code with incorrect behavior.

sunfishcode · 2015-09-30T23:30:18Z

Preference for requiring the fields to be in a fixed order. If we're going to make the syntax this verbose, we should do what we can to make it easy for humans to find what they're looking for in it, so reducing needless variation helps -- not a lot, but on the other hand the burden on producers is minimal.

lukewagner · 2015-10-01T00:34:31Z

In the s-expr format, it's definitely simpler to have fixed order, no duplicates. For the real TextFormat.md, I expect we'll have a wholly different syntax that of course isn't s-expr-based (e.g., for the offset, maybe we'll want some effective-address-looking syntax that resembles what people are used to in assembly languages). In fact, with the above changes agreed upon, this PR (for the design repo) changes to simply removing the existing para about the alignment syntax because alignment is no longer part of the opcode name and thus bears no mention in AstSemantics.md.

jfbastien · 2015-10-01T01:11:35Z

New version lgtm.

rossberg · 2015-10-01T05:31:18Z

Syntax LGTM, but:

On 1 October 2015 at 02:34, Luke Wagner notifications@github.com wrote:

In fact, with the above changes agreed upon, this PR (for the design
repo) changes to simply removing the existing para about the alignment
syntax because alignment is no longer part of the opcode name and thus
bears no mention in AstSemantics.md.

Hm, not sure I follow, why does a syntactic change from being part of the
opcode name to being an opcode parameter remove the need to document it in
AstSemantics?

lukewagner · 2015-10-01T13:28:28Z

@rossberg-chromium Because AstSemantics.md doesn't speak to text/binary encoding and our final text representation of load/store immediates will likely look much different than s-expr syntax. Perhaps it should have never been mentioned in AstSemantics.md in the first place, but I was thinking at the time that it would be kinda neat if the name (which included immediates) was actually the same between s-expr and the real text format.

rossberg · 2015-10-01T13:36:24Z

But regardless of syntax, shouldn't the AstSemantics somehow explain that an optional alignment parameter goes with these opcodes?

lukewagner · 2015-10-01T14:16:50Z

@rossberg-chromium It does; this PR (which has been updated) just removes the specific syntactic detail of how alignment is appended to the opcode name.

rossberg · 2015-10-01T14:54:39Z

Ah, okay, I have been missing that. Sorry for the noise.

Add opcode name syntax for addressing

lukewagner · 2015-10-01T16:10:57Z

Cool, merging based on general lgtm. I can write the ml-proto PR.

lukewagner mentioned this pull request Sep 29, 2015

Add constant offsets to spec WebAssembly/spec#89

Merged

Remove mention of alignment syntax

4c24b66

lukewagner force-pushed the address-syntax branch from 877558a to 4c24b66 Compare October 1, 2015 00:39

lukewagner added a commit that referenced this pull request Oct 1, 2015

Merge pull request #374 from WebAssembly/address-syntax

944b9b6

Add opcode name syntax for addressing

lukewagner merged commit 944b9b6 into master Oct 1, 2015

lukewagner deleted the address-syntax branch October 1, 2015 16:10

lukewagner mentioned this pull request Oct 9, 2015

Separate desugaring from parser & add label sugar WebAssembly/spec#117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opcode name syntax for addressing #374

Add opcode name syntax for addressing #374

lukewagner commented Sep 25, 2015

AndrewScheidecker commented Sep 25, 2015

jfbastien commented Sep 25, 2015

sunfishcode commented Sep 25, 2015

lukewagner commented Sep 25, 2015

ghost commented Sep 29, 2015

kg commented Sep 30, 2015

jfbastien commented Sep 30, 2015

rossberg commented Sep 30, 2015

titzer commented Sep 30, 2015

lukewagner commented Sep 30, 2015

sunfishcode commented Sep 30, 2015

kg commented Sep 30, 2015

jfbastien commented Sep 30, 2015

kg commented Sep 30, 2015

sunfishcode commented Sep 30, 2015

lukewagner commented Oct 1, 2015

jfbastien commented Oct 1, 2015

rossberg commented Oct 1, 2015

lukewagner commented Oct 1, 2015

rossberg commented Oct 1, 2015 via email

lukewagner commented Oct 1, 2015

rossberg commented Oct 1, 2015 via email

lukewagner commented Oct 1, 2015

Add opcode name syntax for addressing #374

Add opcode name syntax for addressing #374

Conversation

lukewagner commented Sep 25, 2015

AndrewScheidecker commented Sep 25, 2015

jfbastien commented Sep 25, 2015

sunfishcode commented Sep 25, 2015

lukewagner commented Sep 25, 2015

ghost commented Sep 29, 2015

kg commented Sep 30, 2015

jfbastien commented Sep 30, 2015

rossberg commented Sep 30, 2015

titzer commented Sep 30, 2015

lukewagner commented Sep 30, 2015

sunfishcode commented Sep 30, 2015

kg commented Sep 30, 2015

jfbastien commented Sep 30, 2015

kg commented Sep 30, 2015

sunfishcode commented Sep 30, 2015

lukewagner commented Oct 1, 2015

jfbastien commented Oct 1, 2015

rossberg commented Oct 1, 2015

lukewagner commented Oct 1, 2015

rossberg commented Oct 1, 2015 via email

lukewagner commented Oct 1, 2015

rossberg commented Oct 1, 2015 via email

lukewagner commented Oct 1, 2015