Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opcode name syntax for addressing #374

Merged
merged 1 commit into from
Oct 1, 2015
Merged

Add opcode name syntax for addressing #374

merged 1 commit into from
Oct 1, 2015

Conversation

lukewagner
Copy link
Member

In preparation for adding linear memory addressing to ml-proto, we need an opcode name syntax. This PR is mostly symmetric with what was done for alignment in #342, except with a +n suffix.

@AndrewScheidecker
Copy link

I think this is pretty confusing in combination with the / used for alignment. On its own, the i32.load/2 doesn't evoke arithmetic, but i32.load+4/2 certainly does in a misleading way.

I'd also prefer that this and alignment are optional operands rather than part of the opcode for the reasons mentioned here. e.g. (i32.load (align 4) (offset 16) base)

@jfbastien
Copy link
Member

Agreed with @AndrewScheidecker. I had a similar conversation with @kg, and we agreed that something like LLVM's bitcode syntax was way easier to read. We can keep s-expressions while being readable.

@sunfishcode
Copy link
Member

LLVM's bitcode doesn't have a syntax for extending loads, truncating stores, or constant offsets. Could you give an example?

@lukewagner
Copy link
Member Author

I'm not really interested in bikeshedding our syntax for our temporary s-expression syntax. I think the current strategy strikes a nice balance in allowing me to quickly write tests that aren't super verbose and I'm not eager to go rewrite everything.

@ghost
Copy link

ghost commented Sep 29, 2015

+1 to arguments such as an offset being separate from the opcode name, even for this 'throw-away' spec. If there were only a very few values that an option can take then it would not seem to be worth 'bike shedding' but it seems extreme to have separate opcodes for each offset which is effectively what is proposed. Adding the offset to the opcode name fails the test of 'being practical to dispatch on each unique opcode name'.

A related issue is that if the offset is an argument then does it need to be a 'literal' node - is there a concept of atomic numbers as arguments?

If adding atomic numbers as arguments then is there any need for 'literal' nodes that represent numbers, see #380 ?

@kg
Copy link
Contributor

kg commented Sep 30, 2015

We're apparently never going to remove the /n for alignment and we're dead set on the weird _nn encoding of load/store sizes, but that aside, +o for offset is problematic in many ways:

  • We've doubled the number of unique load and store opcode names
  • It's unclear where the + goes in the opcode name (before or after the /?)
  • Is the offset decimal or hex? Appears to be decimal here, but I've seen large offsets in code (struct manipulation, for example) that will get unwieldy in decimal.
  • How do we express a negative offset? +-n? -n? If the latter, we've introduced another set of load/store opcode names, so * 3.
    • If we ever add another class of loads and stores we will suffer for this even more. Nontemporal stores are one possible candidate that comes to mind.
    • Once we add more load/store sizes (for simd vectors, or 128 bit integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be 4-byte aligned.

Everything we're shoving into random corners of the opcode name (or assigning new opcode names to) is just a compile-time parameter that specifies the exact nature of the load/store. It's possible that we will want to assign every combination its own globally unique opcode, but that is a binary encoding consideration. At every other stage in the pipeline, doing this adds pointless complexity. Loads and stores are a matrix of possibilities at this point because of the number of parameters, so we should just accept that and encode the matrix sensibly.

@jfbastien
Copy link
Member

Agreed with @kg that we're entering APL land with the textual representation of opcode.

@rossberg
Copy link
Member

On 30 September 2015 at 05:51, Katelyn Gadd notifications@github.com
wrote:

We're apparently never going to remove the /n for alignment and we're
dead set on the weird _nn encoding of load/store sizes, but that aside, +o
for offset is problematic in many ways:

  • We've doubled the number of unique load and store opcodes
  • It's unclear where the + goes in the opcode name (before or after
    the /?)
  • Is the offset decimal or hex? Appears to be decimal here, but I've
    seen large offsets in code (struct manipulation, for example) that will get
    unwieldy in decimal.
  • How do we express a negative offset? +-n? -n? If the latter, we've
    introduced another set of load/store opcodes, so * 3.
    • If we ever add another class of loads and stores we will suffer
      for this even more. Nontemporal stores are one possible candidate that
      comes to mind.
    • Once we add more load/store sizes (for simd vectors, or 128 bit
      integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed
path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be 4-byte
aligned.

Everything we're shoving into random corners of the opcode name (or
assigning new opcode names to) is just a compile-time parameter that
specifies the exact nature of the load/store. It's possible that we will
want to assign every combination its own globally unique opcode, but that
is a binary encoding consideration
. At every other stage in the
pipeline, doing this adds pointless complexity. Loads and stores are a
matrix of possibilities at this point because of the number of parameters,
so we should just accept that and encode the matrix sensibly.

While I agree that the opcode names are getting uncomfortably complicated,
I am even less fond of the above suggestion. Besides being awfully verbose,
it removes the syntactic distinction between what's (part of) an opcode and
what's an operand expression. Just looking at the S-expression structure,
it is no longer apparent what is what, and what syntactic category any part
of an expression is in.

It's a good idea to have a more visible and uniform separation between the
various opcode parameters. But this separation should be syntactically
distinct from the way actual operands are separated. Part of the reason for
inventing the ad-hoc / separator was to make that so. I am not wedded to
that choice in particular, but something along these lines seems highly
preferable.

@titzer
Copy link

titzer commented Sep 30, 2015

On Wed, Sep 30, 2015 at 2:22 PM, rossberg-chromium <notifications@github.com

wrote:

On 30 September 2015 at 05:51, Katelyn Gadd notifications@github.com
wrote:

We're apparently never going to remove the /n for alignment and we're
dead set on the weird _nn encoding of load/store sizes, but that aside,
+o
for offset is problematic in many ways:

  • We've doubled the number of unique load and store opcodes
  • It's unclear where the + goes in the opcode name (before or after
    the /?)
  • Is the offset decimal or hex? Appears to be decimal here, but I've
    seen large offsets in code (struct manipulation, for example) that will
    get
    unwieldy in decimal.
  • How do we express a negative offset? +-n? -n? If the latter, we've
    introduced another set of load/store opcodes, so * 3.
  • If we ever add another class of loads and stores we will suffer
    for this even more. Nontemporal stores are one possible candidate that
    comes to mind.
  • Once we add more load/store sizes (for simd vectors, or 128 bit
    integers, or...) again, more bloat and complexity.

This is not sustainable. I think we need to simplify. Have a 'blessed
path' pair of load/store opcodes that are simple, i.e.
(i32.load addr)
(i32.load_offset addr offs)
where offs is an integral constant. perhaps if it pleases you:
(i32.load (offset offs) addr)
that last suggestion scales nicely:
(i64.load (offset offs) (align 4) (zextend 4) addr)
for a zero-extended i32->i64 load from addr[offs], guaranteed to be
4-byte
aligned.

Everything we're shoving into random corners of the opcode name (or
assigning new opcode names to) is just a compile-time parameter that
specifies the exact nature of the load/store. It's possible that we will
want to assign every combination its own globally unique opcode, but
that
is a binary encoding consideration
. At every other stage in the
pipeline, doing this adds pointless complexity. Loads and stores are a
matrix of possibilities at this point because of the number of
parameters,
so we should just accept that and encode the matrix sensibly.

While I agree that the opcode names are getting uncomfortably complicated,
I am even less fond of the above suggestion. Besides being awfully verbose,
it removes the syntactic distinction between what's (part of) an opcode and
what's an operand expression. Just looking at the S-expression structure,
it is no longer apparent what is what, and what syntactic category any part
of an expression is in.

It's a good idea to have a more visible and uniform separation between the
various opcode parameters. But this separation should be syntactically
distinct from the way actual operands are separated. Part of the reason for
inventing the ad-hoc / separator was to make that so. I am not wedded to
that choice in particular, but something along these lines seems highly
preferable.

In the past I've been fond of the convention that static arguments to
opcodes are enclosed in braces [...] while all dynamic arguments are in
parentheses (...). We use that convention internally in our compilers and
at one point this was also part of the design repository. I recall it was
removed in favor of other lexical tricks but I didn't follow exactly why.


Reply to this email directly or view it on GitHub
#374 (comment).

@lukewagner
Copy link
Member Author

I agree with @rossberg-chromium that it's useful to have immediates syntactically distinct from general subexpressions. @titzer braces look good to me too, but it seems like they are reserved puncutation in the proper s-expression language which we are ostensibly generating. It looks like = is pseudo-alphabetic (and thus allowed in plain tokens), suggesting:

(i32.load (i32.const 0))
(i32.load align=2 (i32.const 0))
(i32.load offset=4 align=2 (i32.const 0))
(i32.load16_s offset=2 align=1 (i32.const 0))

which looks pretty nice to me and is rather more symmetric with how the immediate of get_local is specified (as a separate token).

@sunfishcode
Copy link
Member

lgtm

@kg
Copy link
Contributor

kg commented Sep 30, 2015

(i32.load (i32.const 0))
(i32.load align=2 (i32.const 0))
(i32.load offset=4 align=2 (i32.const 0))
(i32.load16_s offset=2 align=1 (i32.const 0))

Looks great and addresses my concerns re offset/align. I'm okay with leaving the memory size and sign suffixes there as long as we address those two. Thanks.

@jfbastien
Copy link
Member

I like @lukewagner's suggestion on =.

Small questions:

  • Do we allow any ordering to them (offset=2 align=1 equivalent to align=1 offset=2)? I think that would be nice.
  • I'm assuming we don't allow duplicates (align=1 align=1)?
  • Also assuming we don't allow unknown "fields" (hithere=1)?

@kg
Copy link
Contributor

kg commented Sep 30, 2015

I think they should be order-independent, and we shouldn't allow duplicates. Allowing unknown fields seems like it would break our feature-detection principles and would potentially result in an old runtime executing new code with incorrect behavior.

@sunfishcode
Copy link
Member

Preference for requiring the fields to be in a fixed order. If we're going to make the syntax this verbose, we should do what we can to make it easy for humans to find what they're looking for in it, so reducing needless variation helps -- not a lot, but on the other hand the burden on producers is minimal.

@lukewagner
Copy link
Member Author

In the s-expr format, it's definitely simpler to have fixed order, no duplicates. For the real TextFormat.md, I expect we'll have a wholly different syntax that of course isn't s-expr-based (e.g., for the offset, maybe we'll want some effective-address-looking syntax that resembles what people are used to in assembly languages). In fact, with the above changes agreed upon, this PR (for the design repo) changes to simply removing the existing para about the alignment syntax because alignment is no longer part of the opcode name and thus bears no mention in AstSemantics.md.

@jfbastien
Copy link
Member

New version lgtm.

@rossberg
Copy link
Member

rossberg commented Oct 1, 2015

Syntax LGTM, but:

On 1 October 2015 at 02:34, Luke Wagner notifications@github.com wrote:

In fact, with the above changes agreed upon, this PR (for the design
repo) changes to simply removing the existing para about the alignment
syntax because alignment is no longer part of the opcode name and thus
bears no mention in AstSemantics.md.

Hm, not sure I follow, why does a syntactic change from being part of the
opcode name to being an opcode parameter remove the need to document it in
AstSemantics?

@lukewagner
Copy link
Member Author

@rossberg-chromium Because AstSemantics.md doesn't speak to text/binary encoding and our final text representation of load/store immediates will likely look much different than s-expr syntax. Perhaps it should have never been mentioned in AstSemantics.md in the first place, but I was thinking at the time that it would be kinda neat if the name (which included immediates) was actually the same between s-expr and the real text format.

@rossberg
Copy link
Member

rossberg commented Oct 1, 2015 via email

@lukewagner
Copy link
Member Author

@rossberg-chromium It does; this PR (which has been updated) just removes the specific syntactic detail of how alignment is appended to the opcode name.

@rossberg
Copy link
Member

rossberg commented Oct 1, 2015 via email

lukewagner added a commit that referenced this pull request Oct 1, 2015
Add opcode name syntax for addressing
@lukewagner lukewagner merged commit 944b9b6 into master Oct 1, 2015
@lukewagner lukewagner deleted the address-syntax branch October 1, 2015 16:10
@lukewagner
Copy link
Member Author

Cool, merging based on general lgtm. I can write the ml-proto PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants