Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Macro expansion / lowering #329

Closed
wants to merge 15 commits into from
125 changes: 125 additions & 0 deletions docs/src/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -845,3 +845,128 @@ heuristics to get something which "looks nice"... and ML systems have become
very good at heuristics. Also, we've got huge piles of training data — just
choose some high quality, tastefully hand-formatted libraries.


# Notes on lowering

## How does macro expansion work?

`macroexpand(m::Module, x)` calls `jl_macroexpand` in ast.c:

```
jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule)
{
expr = jl_copy_ast(expr);
expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0);
expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule);
return expr;
}
```

First we copy the AST here. This is mostly a trivial deep copy of `Expr`s and
shallow copy of their non-`Expr` children, except for when they contain
embedded `CodeInfo/phi/phic` nodes which are also deep copied.

Second we expand macros recursively by calling

`jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)`

This relies on state indexed by `inmodule` and `world`, which gives it some
funny properties:
* `module` expressions can't be expanded: macro expansion depends on macro
lookup within the module, but we can't do that without `eval`.

Expansion proceeds from the outermost to innermost macros. So macros see any
macro calls or quasiquote (`quote/$`) in their children as unexpanded forms.

Things which are expanded:
* `quote` is expanded using flisp code in `julia-bq-macro`
- symbol / ssavalue -> `QuoteNode` (inert)
- atom -> itself
- at depth zero, `$` expands to its content
- Expressions `x` without `$` expand to `(copyast (inert x))`
- Other expressions containing a `$` expand to a call to `_expr` with all the
args mapped through `julia-bq-expand-`. Roughly!
- Special handling exists for multi-splatting arguments as in `quote quote $$(x...) end end`
* `macrocall` proceeds with
- Expand with `jl_invoke_julia_macro`
- Call `eval` on the macro name (!!) to get the macro function. Look up
the method.
- Set up arguments for the macro calling convention
- Wraps errors in macro invocation in `LoadError`
- Returns the expression, as well as the module at
which that method of that macro was defined and `LineNumberNode` where
the macro was invoked in the source.
- Deep copy the AST
- Recursively expand child macros in the context of the module where the
macrocall method was defined
- Wrap the result in `(hygienic-scope ,result ,newctx.m ,lineinfo)` (except
for special case optimizations)
* `hygenic-scope` expands `args[1]` with `jl_expand_macros`, with the module
of expansion set to `args[2]`. Ie, it's the `Expr` representation of the
module and expression arguments to `macroexpand`. The way this returns
either `hygenic-scope` or unwraps is a bit confusing.
* "`do` macrocalls" have their own special handling because the macrocall is
the child of the `do`. This seems like a mess!!


## Scope resolution

This pass disambiguates variables which have the same name in different scopes
and fills in the list of local variables within each lambda.

### Which data is needed to define a scope?

As scope is a collection of variable names by category:
* `argument` - arguments to a lambda
* `local` - variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-def
* `global` - variables declared global (in lambdas) or implicitly global (at top level)
* `static-parameter` - lambda type arguments from `where` clauses

### How does scope resolution work?

We traverse the AST starting at the root paying attention to certian nodes:
* Nodes representing identifiers (Identifier, operators, var)
- If a variable exists in the table, it's *replaced* with the value in the table.
- If it doesn't exist, it becomes an `outerref`
* Variable scoping constructs: `local`, `local-def`
- collected by scope-block
- removed during traversal
* Scope metadata `softscope`, `hardscope` - just removed
* New scopes
- `lambda` creates a new scope containing itself and its arguments,
otherwise copying the parent scope. It resolves the body with that new scope.
- `scope-block` is really complicated - see below
* Scope queries `islocal`, `locals`
- `islocal` - statically expand to true/false based on whether var name is a local var
- `locals` - return list of locals - see `@locals`
- `require-existing-local` - somewhat like `islocal`, but allows globals
too (whaa?! naming) and produces a lowering error immediately if variable
is not known. Should be called `require-in-scope` ??
* `break-block`, `symbolicgoto`, `symboliclabel` need special handling because
one of their arguments is a non-quoted symbol.
* Add static parameters for generated functions `with-static-parameters`
* `method` - special handling for static params

`scope-block` is the complicated bit. It's processed by
* Searching the expressions within the block for any `local`, `local-def`,
`global` and assigned vars. Searching doesn't recurse into `lambda`,
`scope-block`, `module` and `toplevel`
* Building lists of implicit locals or globals (depending on whether we're in a
top level thunk)
* Figuring out which local variables need to be renamed. This is any local variable
with a name which has already occurred in processing one of the previous scope blocks
* Check any conflicting local/global decls and soft/hard scope
* Build new scope with table of renames
* Resolve the body with the new scope, applying the renames


### Oddities / warts

* I'm not sure we want to disambiguate via renames! What if we annotated
identifier and identifier-like nodes by adding a counter instead of renaming
them? We could use a `scope_disamb` with equivalences:
- -1 ==> outerref
- 0 ==> local, not renamed
- n>=1 ==> local, rennamed


3 changes: 3 additions & 0 deletions src/JuliaSyntax.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ include("green_tree.jl")
include("syntax_tree.jl")
include("expr.jl")

include("macroexpand.jl")
include("lowering.jl")

# Hooks to integrate the parser with Base
include("hooks.jl")
include("precompile.jl")
Expand Down
12 changes: 9 additions & 3 deletions src/diagnostics.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ last_byte(d::Diagnostic) = d.last_byte
is_error(d::Diagnostic) = d.level === :error
Base.range(d::Diagnostic) = first_byte(d):last_byte(d)

const Diagnostics = Vector{Tuple{SourceFile,Diagnostic}}

# Make relative path into a file URL
function _file_url(filename)
@static if Sys.iswindows()
Expand Down Expand Up @@ -78,15 +80,19 @@ function show_diagnostic(io::IO, diagnostic::Diagnostic, source::SourceFile)
context_lines_before=1, context_lines_after=0)
end

function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, source::SourceFile)
function show_diagnostics(io::IO, diagnostics::Diagnostics)
first = true
for d in diagnostics
for (s,d) in diagnostics
first || println(io)
first = false
show_diagnostic(io, d, source)
show_diagnostic(io, d, s)
end
end

function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, source::SourceFile)
show_diagnostics(io, collect(zip(Iterators.repeated(source), diagnostics)))
end

function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, text::AbstractString)
show_diagnostics(io, diagnostics, SourceFile(text))
end
Expand Down
3 changes: 2 additions & 1 deletion src/expr.jl
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,8 @@ end

function _to_expr(node::SyntaxNode)
if !haschildren(node)
offset, txtbuf = _unsafe_wrap_substring(sourcetext(node.source))
offset, txtbuf = isnothing(node.source) ? (0,nothing) :
_unsafe_wrap_substring(sourcetext(node.source))
return _leaf_to_Expr(node.source, txtbuf, head(node), range(node) .+ offset, node)
end
cs = children(node)
Expand Down
46 changes: 45 additions & 1 deletion src/kinds.jl
Original file line number Diff line number Diff line change
Expand Up @@ -917,6 +917,49 @@
# Container for a single statement/atom plus any trivia and errors
"wrapper"
"END_SYNTAX_KINDS"

"BEGIN_LOWERING_KINDS"
# Compiler metadata hints
"meta"
# A literal Julia value of any kind, as might be inserted by the AST
# during macro expansion
"Value"
"inbounds"
"inline"
"noinline"
"loopinfo"
# Identifier for a value which is only assigned once ("SSA value")
"SSALabel"
# Scope expressions `(hygienic_scope ex s)` mean `ex` should be
# interpreted as being in scope `s`.
"hygienic_scope"
# Various heads harvested from flisp lowering.
# (TODO: May or may not need all these - assess later)
"break_block"
"scope_block"
"local_def"
"_while"
"_do_while"
"with_static_parameters"
"top"
"core"
"toplevel_butfirst"
"thunk"
"lambda"
"moved_local"
"the_exception"
"foreigncall"
"new"
"globalref"
"outerref"
"enter"
"leave"
"goto"
"gotoifnot"
"trycatchelse"
"tryfinally"
"method"
"END_LOWERING_KINDS"
]

"""
Expand Down Expand Up @@ -1117,14 +1160,15 @@
is_literal(k::Kind) = K"BEGIN_LITERAL" <= k <= K"END_LITERAL"
is_operator(k::Kind) = K"BEGIN_OPS" <= k <= K"END_OPS"
is_word_operator(k::Kind) = (k == K"in" || k == K"isa" || k == K"where")
is_identifier(k::Kind) = k == K"Identifier" || k == K"var" || is_operator(k) || is_macro_name(k)

is_contextual_keyword(k) = is_contextual_keyword(kind(k))
is_error(k) = is_error(kind(k))
is_keyword(k) = is_keyword(kind(k))
is_literal(k) = is_literal(kind(k))
is_operator(k) = is_operator(kind(k))
is_word_operator(k) = is_word_operator(kind(k))

is_identifier(x) = is_identifier(kind(x))

Check warning on line 1171 in src/kinds.jl

View check run for this annotation

Codecov / codecov/patch

src/kinds.jl#L1171

Added line #L1171 was not covered by tests

# Predicates for operator precedence
# FIXME: Review how precedence depends on dottedness, eg
Expand Down
Loading
Loading