Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow struct literals in ambiguous positions. #92

Merged
merged 1 commit into from
Jun 10, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions 0000-struct-grammar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
- Start Date:
- RFC PR #:
- Rust Issue #:

# Summary

Do not identify struct literals by searching for `:`. Instead define a sub-
category of expressions which excludes struct literals and re-define `for`,
`if`, and other expressions which take an expression followed by a block (or
non-terminal which can be replaced by a block) to take this sub-category,
instead of all expressions.

# Motivation

Parsing by looking ahead is fragile - it could easily be broken if we allow `:`
to appear elsewhere in types (e.g., type ascription) or if we change struct
literals to not require the `:` (e.g., if we allow empty structs to be written
with braces, or if we allow struct literals to unify field names to local
variable names, as has been suggested in the past and which we currently do for
struct literal patterns). We should also be able to give better error messages
today if users make these mistakes. More worringly, we might come up with some
language feature in the future which is not predictable now and which breaks
with the current system.

Hopefully, it is pretty rare to use struct literals in these positions, so there
should not be much fallout. Any problems can be easily fixed by assigning the
struct literal into a variable. However, this is a backwards incompatible
change, so it should block 1.0.

# Detailed design

Here is a simplified version of a subset of Rust's abstract syntax:

```
e ::= x
| e `.` f
| name `{` (x `:` e)+ `}`
| block
| `for` e `in` e block
| `if` e block (`else` block)?
| `|` pattern* `|` e
| ...
block ::= `{` (e;)* e? `}`
```

Parsing this grammar is ambiguous since `x` cannot be distinguished from `name`,
so `e block` in the for expression is ambiguous with the struct literal
expression. We currently solve this by using lookahead to find a `:` token in
the struct literal.

I propose the following adjustment:

```
e ::= e'
| name `{` (x `:` e)+ `}`
| `|` pattern* `|` e
| ...
e' ::= x
| e `.` f
| block
| `for` e `in` e' block
| `if` e' block (`else` block)?
| `|` pattern* `|` e'
| ...
block ::= `{` (e;)* e? `}`
```

`e' is just e without struct literal expressions. We use e' instead of e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting issue here.

`wherever e is followed directly by block or any other non-terminal which may
`have block as its first terminal (after any possible expansions).

For any expressions where a sub-expression is the final lexical element
(closures in the subset above, but also unary and binary operations), we require
two versions of the meta-expression - the normal one in `e` and a version with
`e'` for the final element in `e'`.

Implementation would be simpler, we just add a flag to `parser::restriction`
called `RESTRICT_BLOCK` or something, which puts us into a mode which reflects
`e'`. We would drop in to this mode when parsing `e'` position expressions and
drop out of it for all but the last sub-expression of an expression.

# Drawbacks

It makes the formal grammar and parsing a little more complicated (although it
is simpler in terms of needing less lookahead and avoiding a special case).

# Alternatives

Don't do this.

Allow all expressions but greedily parse non-terminals in these positions, e.g.,
`for N {} {}` would be parsed as `for (N {}) {}`. This seems worse because I
believe it will be much rarer to have structs in these positions than to have an
identifier in the first position, followed by two blocks (i.e., parse as `(for N
{}) {}`).

# Unresolved questions

Do we need to expose this distinction anywhere outside of the parser? E.g.,
macros?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible we could get away with leaving macros alone. The sequence $e:expr $b:block would presumably interpret Foo { bar } as the start of a struct literal for $e:expr (and then subsequently error out because it's not correct). That said, it seems a shame to not allow macros to parse $e:expr $b:block the same way for/if do. Fixing this would require coming up with a name to use for the restricted expression nonterminal type.