diff --git a/docs/src/design.md b/docs/src/design.md index 737dc31b..92955f2d 100644 --- a/docs/src/design.md +++ b/docs/src/design.md @@ -845,3 +845,128 @@ heuristics to get something which "looks nice"... and ML systems have become very good at heuristics. Also, we've got huge piles of training data — just choose some high quality, tastefully hand-formatted libraries. + +# Notes on lowering + +## How does macro expansion work? + +`macroexpand(m::Module, x)` calls `jl_macroexpand` in ast.c: + +``` +jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule) +{ + expr = jl_copy_ast(expr); + expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0); + expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule); + return expr; +} +``` + +First we copy the AST here. This is mostly a trivial deep copy of `Expr`s and +shallow copy of their non-`Expr` children, except for when they contain +embedded `CodeInfo/phi/phic` nodes which are also deep copied. + +Second we expand macros recursively by calling + +`jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)` + +This relies on state indexed by `inmodule` and `world`, which gives it some +funny properties: +* `module` expressions can't be expanded: macro expansion depends on macro + lookup within the module, but we can't do that without `eval`. + +Expansion proceeds from the outermost to innermost macros. So macros see any +macro calls or quasiquote (`quote/$`) in their children as unexpanded forms. + +Things which are expanded: +* `quote` is expanded using flisp code in `julia-bq-macro` + - symbol / ssavalue -> `QuoteNode` (inert) + - atom -> itself + - at depth zero, `$` expands to its content + - Expressions `x` without `$` expand to `(copyast (inert x))` + - Other expressions containing a `$` expand to a call to `_expr` with all the + args mapped through `julia-bq-expand-`. Roughly! + - Special handling exists for multi-splatting arguments as in `quote quote $$(x...) end end` +* `macrocall` proceeds with + - Expand with `jl_invoke_julia_macro` + - Call `eval` on the macro name (!!) to get the macro function. Look up + the method. + - Set up arguments for the macro calling convention + - Wraps errors in macro invocation in `LoadError` + - Returns the expression, as well as the module at + which that method of that macro was defined and `LineNumberNode` where + the macro was invoked in the source. + - Deep copy the AST + - Recursively expand child macros in the context of the module where the + macrocall method was defined + - Wrap the result in `(hygienic-scope ,result ,newctx.m ,lineinfo)` (except + for special case optimizations) +* `hygenic-scope` expands `args[1]` with `jl_expand_macros`, with the module + of expansion set to `args[2]`. Ie, it's the `Expr` representation of the + module and expression arguments to `macroexpand`. The way this returns + either `hygenic-scope` or unwraps is a bit confusing. +* "`do` macrocalls" have their own special handling because the macrocall is + the child of the `do`. This seems like a mess!! + + +## Scope resolution + +This pass disambiguates variables which have the same name in different scopes +and fills in the list of local variables within each lambda. + +### Which data is needed to define a scope? + +As scope is a collection of variable names by category: +* `argument` - arguments to a lambda +* `local` - variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-def +* `global` - variables declared global (in lambdas) or implicitly global (at top level) +* `static-parameter` - lambda type arguments from `where` clauses + +### How does scope resolution work? + +We traverse the AST starting at the root paying attention to certian nodes: +* Nodes representing identifiers (Identifier, operators, var) + - If a variable exists in the table, it's *replaced* with the value in the table. + - If it doesn't exist, it becomes an `outerref` +* Variable scoping constructs: `local`, `local-def` + - collected by scope-block + - removed during traversal +* Scope metadata `softscope`, `hardscope` - just removed +* New scopes + - `lambda` creates a new scope containing itself and its arguments, + otherwise copying the parent scope. It resolves the body with that new scope. + - `scope-block` is really complicated - see below +* Scope queries `islocal`, `locals` + - `islocal` - statically expand to true/false based on whether var name is a local var + - `locals` - return list of locals - see `@locals` + - `require-existing-local` - somewhat like `islocal`, but allows globals + too (whaa?! naming) and produces a lowering error immediately if variable + is not known. Should be called `require-in-scope` ?? +* `break-block`, `symbolicgoto`, `symboliclabel` need special handling because + one of their arguments is a non-quoted symbol. +* Add static parameters for generated functions `with-static-parameters` +* `method` - special handling for static params + +`scope-block` is the complicated bit. It's processed by +* Searching the expressions within the block for any `local`, `local-def`, + `global` and assigned vars. Searching doesn't recurse into `lambda`, + `scope-block`, `module` and `toplevel` +* Building lists of implicit locals or globals (depending on whether we're in a + top level thunk) +* Figuring out which local variables need to be renamed. This is any local variable + with a name which has already occurred in processing one of the previous scope blocks +* Check any conflicting local/global decls and soft/hard scope +* Build new scope with table of renames +* Resolve the body with the new scope, applying the renames + + +### Oddities / warts + +* I'm not sure we want to disambiguate via renames! What if we annotated + identifier and identifier-like nodes by adding a counter instead of renaming + them? We could use a `scope_disamb` with equivalences: + - -1 ==> outerref + - 0 ==> local, not renamed + - n>=1 ==> local, rennamed + + diff --git a/src/JuliaSyntax.jl b/src/JuliaSyntax.jl index 3f1ad27a..650f6218 100644 --- a/src/JuliaSyntax.jl +++ b/src/JuliaSyntax.jl @@ -41,6 +41,9 @@ include("green_tree.jl") include("syntax_tree.jl") include("expr.jl") +include("macroexpand.jl") +include("lowering.jl") + # Hooks to integrate the parser with Base include("hooks.jl") include("precompile.jl") diff --git a/src/diagnostics.jl b/src/diagnostics.jl index c84fa0ac..69476291 100644 --- a/src/diagnostics.jl +++ b/src/diagnostics.jl @@ -42,6 +42,8 @@ last_byte(d::Diagnostic) = d.last_byte is_error(d::Diagnostic) = d.level === :error Base.range(d::Diagnostic) = first_byte(d):last_byte(d) +const Diagnostics = Vector{Tuple{SourceFile,Diagnostic}} + # Make relative path into a file URL function _file_url(filename) @static if Sys.iswindows() @@ -78,15 +80,19 @@ function show_diagnostic(io::IO, diagnostic::Diagnostic, source::SourceFile) context_lines_before=1, context_lines_after=0) end -function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, source::SourceFile) +function show_diagnostics(io::IO, diagnostics::Diagnostics) first = true - for d in diagnostics + for (s,d) in diagnostics first || println(io) first = false - show_diagnostic(io, d, source) + show_diagnostic(io, d, s) end end +function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, source::SourceFile) + show_diagnostics(io, collect(zip(Iterators.repeated(source), diagnostics))) +end + function show_diagnostics(io::IO, diagnostics::AbstractVector{Diagnostic}, text::AbstractString) show_diagnostics(io, diagnostics, SourceFile(text)) end diff --git a/src/expr.jl b/src/expr.jl index f674b984..9dcf0fcf 100644 --- a/src/expr.jl +++ b/src/expr.jl @@ -498,7 +498,8 @@ end function _to_expr(node::SyntaxNode) if !haschildren(node) - offset, txtbuf = _unsafe_wrap_substring(sourcetext(node.source)) + offset, txtbuf = isnothing(node.source) ? (0,nothing) : + _unsafe_wrap_substring(sourcetext(node.source)) return _leaf_to_Expr(node.source, txtbuf, head(node), range(node) .+ offset, node) end cs = children(node) diff --git a/src/kinds.jl b/src/kinds.jl index 54f37e88..4c15bcbf 100644 --- a/src/kinds.jl +++ b/src/kinds.jl @@ -917,6 +917,49 @@ const _kind_names = # Container for a single statement/atom plus any trivia and errors "wrapper" "END_SYNTAX_KINDS" + + "BEGIN_LOWERING_KINDS" + # Compiler metadata hints + "meta" + # A literal Julia value of any kind, as might be inserted by the AST + # during macro expansion + "Value" + "inbounds" + "inline" + "noinline" + "loopinfo" + # Identifier for a value which is only assigned once ("SSA value") + "SSALabel" + # Scope expressions `(hygienic_scope ex s)` mean `ex` should be + # interpreted as being in scope `s`. + "hygienic_scope" + # Various heads harvested from flisp lowering. + # (TODO: May or may not need all these - assess later) + "break_block" + "scope_block" + "local_def" + "_while" + "_do_while" + "with_static_parameters" + "top" + "core" + "toplevel_butfirst" + "thunk" + "lambda" + "moved_local" + "the_exception" + "foreigncall" + "new" + "globalref" + "outerref" + "enter" + "leave" + "goto" + "gotoifnot" + "trycatchelse" + "tryfinally" + "method" + "END_LOWERING_KINDS" ] """ @@ -1117,6 +1160,7 @@ is_block_continuation_keyword(k::Kind) = K"BEGIN_BLOCK_CONTINUATION_KEYWORDS" <= is_literal(k::Kind) = K"BEGIN_LITERAL" <= k <= K"END_LITERAL" is_operator(k::Kind) = K"BEGIN_OPS" <= k <= K"END_OPS" is_word_operator(k::Kind) = (k == K"in" || k == K"isa" || k == K"where") +is_identifier(k::Kind) = k == K"Identifier" || k == K"var" || is_operator(k) || is_macro_name(k) is_contextual_keyword(k) = is_contextual_keyword(kind(k)) is_error(k) = is_error(kind(k)) @@ -1124,7 +1168,7 @@ is_keyword(k) = is_keyword(kind(k)) is_literal(k) = is_literal(kind(k)) is_operator(k) = is_operator(kind(k)) is_word_operator(k) = is_word_operator(kind(k)) - +is_identifier(x) = is_identifier(kind(x)) # Predicates for operator precedence # FIXME: Review how precedence depends on dottedness, eg diff --git a/src/lowering.jl b/src/lowering.jl new file mode 100644 index 00000000..7843808d --- /dev/null +++ b/src/lowering.jl @@ -0,0 +1,267 @@ +# Experimental port of some parts of Julia's code lowering (ie, the symbolic +# non-type-related compiler passes) + +#------------------------------------------------------------------------------- +# Utilities + +struct ExpansionContext + next_ssa_label::Ref{Int} +end + +ExpansionContext() = ExpansionContext(Ref(0)) + +function Identifier(val, srcref) + SyntaxNode(K"Identifier", val, srcref=srcref) +end + +function SSALabel(ctx, srcref) + val = ctx.next_ssa_label[] + ctx.next_ssa_label[] += 1 + SyntaxNode(K"SSALabel", val, srcref=srcref) +end + + +#------------------------------------------------------------------------------- + +# pass 1: syntax desugaring + +function is_quoted(ex) + kind(ex) in KSet"quote top core globalref outerref break inert + meta inbounds inline noinline loopinfo" +end + +function expand_condition(ctx, ex) + if head(ex) == K"block" || head(ex) == K"||" || head(ex) == K"&&" + # || and && get special lowering so that they compile directly to jumps + # rather than first computing a bool and then jumping. + error("TODO expand_condition") + end + expand_forms(ctx, ex) +end + +function blockify(ex) + kind(ex) == K"block" ? ex : SyntaxNode(K"block", ex, [ex]) +end + +function expand_assignment(ctx, ex) +end + +function expand_forms(ctx, ex) + k = kind(ex) + if k == K"while" + SyntaxNode(K"break_block", ex, [ + Identifier(:loop_exit, ex), # Should this refer syntactically to the `end`? + SyntaxNode(K"_while", ex, [ + expand_condition(ctx, ex[1]), + SyntaxNode(K"break_block", ex, [ + Identifier(:loop_cont, ex[2]), + SyntaxNode(K"scope_block", ex[2], [ + blockify(expand_forms(ctx, ex[2])) + ]) + ]) + ]) + ]) + elseif !haschildren(ex) + ex + else + if k == K"=" && (numchildren(ex) != 2 && kind(ex[1]) != K"Identifier") + error("TODO") + end + SyntaxNode(head(ex), map(e->expand_forms(ctx,e), children(ex)), srcref=ex) + end +end + +#------------------------------------------------------------------------------- +# Pass 2: Identify and rename local vars + +function decl_var(ex) + kind(ex) == K"::" ? ex[1] : ex +end + +function is_underscore(ex) + k = kind(ex) + return (k == K"Identifier" && valueof(ex) == :_) || + (k == K"var" && valueof(ex[1]) == :_) +end + +# FIXME: The problem of "what is an identifier" pervades lowering ... we have +# various things which seem like identifiers: +# +# * Identifier (symbol) +# * K"var" nodes +# * Operator kinds +# * Underscore placeholders +# +# Can we avoid having the logic of "what is an identifier" repeated by dealing +# with these during desugaring +# * Attach an identifier attribute to nodes. If they're an identifier they get this +# * Or alternatively / more easily, desugar by replacment ?? +function identifier_name(ex) + if kind(ex) == K"var" + ex = ex[1] + end + valueof(ex) +end + +function is_valid_name(ex) + n = identifier_name(ex) + n !== :ccall && n !== :cglobal +end + +function _schedule_traverse(stack, e::SyntaxNode) + push!(stack, e) + return nothing +end +function _schedule_traverse(stack, es::Union{Tuple,Vector}) + append!(stack, es) + return nothing +end + +function traverse_ast(f, ex) + todo = [ex] + while !isempty(todo) + e1 = pop!(todo) + f(e1, e->_schedule_traverse(todo, e)) + end +end + +function find_in_ast(f, ex) + todo = [ex] + while !isempty(todo) + e1 = pop!(todo) + res = f(e1, e->_schedule_traverse(todo, e)) + if !isnothing(res) + return res + end + end + return nothing +end + +# NB: This only really works after expand_forms has already processed assignments. +function find_assigned_vars(ex) + vars = SyntaxNode[] + # _find_assigned_vars(vars, ex) + traverse_ast(ex) do e, traverse + k = kind(e) + if !haschildren(e) || is_quoted(k) || k in KSet"lambda scope_block module toplevel" + return + elseif k == K"method" + error("TODO") + return nothing + elseif k == K"=" + v = decl_var(e[1]) + if !(kind(v) in KSet"SSALabel globalref outerref" || is_underscore(e)) + push!(vars, v) + end + traverse(e[2]) + else + traverse(children(e)) + end + end + return unique(vars) +end + +function find_decls(decl_kind, ex) + vars = SyntaxNode[] + traverse_ast(ex) do e, traverse + k = kind(e) + if !haschildren(e) || is_quoted(k) || k in KSet"lambda scope_block module toplevel" + return + elseif k == decl_kind + if !is_underscore(e[1]) + push!(vars, decl_var(e[1])) + end + else + traverse(children(e)) + end + end +end + +# Determine whether decl_kind is in the scope of `ex` +# +# flisp: find-scope-decl +function has_scope_decl(decl_kind, ex) + find_in_ast(ex) do e, traverse + k = kind(e) + if !haschildren(e) || is_quoted(k) || k in KSet"lambda scope_block module toplevel" + return + elseif k == decl_kind + return e + else + traverse(children(ex)) + end + end +end + +struct LambdaLocals + # For resolve-scopes pass + locals::Set{Symbol} +end + +struct LambdaVars + # For analyze-variables pass + # var_info_lst::Set{Tuple{Symbol,Symbol}} # ish? + # captured_var_infos ?? + # ssalabels::Set{SSALabel} + # static_params::Set{Symbol} +end + +# TODO: +# 1. Use `.val` to store LambdaVars/LambdaLocals/ScopeInfo +# 2. Incorporate hygenic-scope here so we always have a parent scope when +# processing variables rather than putting them into a thunk (??) + +struct ScopeInfo + lambda_vars::Union{LambdaLocals,LambdaVars} + parent::Union{Nothing,ScopeInfo} + args::Set{Symbol} + locals::Set{Symbol} + globals::Set{Symbol} + static_params::Set{Symbol} + renames::Dict{Symbol,Symbol} + implicit_globals::Set{Symbol} + warn_vars::Set{Symbol} + is_soft::Bool + is_hard::Bool + table::Dict{Symbol,Any} +end + +# Transform lambdas from +# (lambda (args ...) body) +# to the form +# (lambda (args...) (locals...) body) +function resolve_scopes_(ctx, scope, ex) +end + +function resolve_scopes(ctx, ex) + resolve_scopes_(ctx, scope, ex) +end + +#------------------------------------------------------------------------------- +# Pass 3: analyze variables +# +# This pass records information about variables used by closure conversion. +# finds which variables are assigned or captured, and records variable +# type declarations. +# +# This info is recorded by setting the second argument of `lambda` expressions +# in-place to +# (var-info-lst captured-var-infos ssavalues static_params) +# where var-info-lst is a list of var-info records + +#------------------------------------------------------------------------------- +# Pass 4: closure conversion +# +# This pass lifts all inner functions to the top level by generating +# a type for them. +# +# For example `f(x) = y->(y+x)` is converted to +# +# immutable yt{T} +# x::T +# end +# +# (self::yt)(y) = y + self.x +# +# f(x) = yt(x) + diff --git a/src/macroexpand.jl b/src/macroexpand.jl new file mode 100644 index 00000000..73806d21 --- /dev/null +++ b/src/macroexpand.jl @@ -0,0 +1,545 @@ +struct ScopeSpec + mod::Module + is_global::Bool +end + +# This is the type of syntax literals which macros will manipulate. +# Unlike Expr it includes the module so that hygiene can be automatic +# +# TODO: Maybe we should put `mod` into SourceFile instead, or equivalent? +# This is nice as we +# 1. Don't need a new syntax literal type separate from the unscoped AST type +# 2. Don't need to do all the wrapping we do here +# But it's also not nice because we need to reallocate that whole part of the +# tree and we wouldn't later be able to use a compressed GreenNode inside a +# lazy tree to store those parts of the AST... +# +# TODO: Maybe rename to just `Syntax`? +struct SyntaxLiteral + scope::Union{Nothing,ScopeSpec} + tree::SyntaxNode + + function SyntaxLiteral(scope::Union{Nothing,ScopeSpec}, tree::SyntaxNode) + while kind(tree) == K"hygienic_scope" + scope = valueof(tree[2]) + tree = tree[1] + end + return new(scope, tree) + end +end + +function SyntaxLiteral(h::Union{Kind,SyntaxHead}, srcref::SyntaxLiteral, children) + # TODO: We don't care about the scope here right? Right?? + # - It's ultimately only the identifiers which need scope recorded? + s1 = first(children).scope + # TODO: This assumes all `children` are of type SyntaxLiteral. + # But in reality, we'd, presumably, like to put plain literals and + # identifiers in here? In which case, the scope does matter because we'll + # be passing it to the children. + if all(c.scope == s1 for c in children) + SyntaxLiteral(s1, SyntaxNode(h, srcref.tree, [c.tree for c in children])) + else + SyntaxLiteral(nothing, SyntaxNode(h, srcref.tree, [SyntaxNode(c) for c in children])) + end +end + +function SyntaxLiteral(h::Union{Kind,SyntaxHead}, scope::Union{Nothing,ScopeSpec}, + srcref::SyntaxLiteral, val) + SyntaxLiteral(scope, SyntaxNode(h, srcref.tree, val)) +end + + +function Base.iterate(ex::SyntaxLiteral) + if numchildren(ex) == 0 + return nothing + end + return (child(ex,1), 1) +end + +function Base.iterate(ex::SyntaxLiteral, i) + i += 1 + if i > numchildren(ex) + return nothing + else + return (child(ex, i), i) + end +end + +children(ex::SyntaxLiteral) = (SyntaxLiteral(ex.scope, c) for c in children(ex.tree)) +haschildren(ex::SyntaxLiteral) = haschildren(ex.tree) +numchildren(ex::SyntaxLiteral) = numchildren(ex.tree) + +Base.range(ex::SyntaxLiteral) = range(ex.tree) + +function child(ex::SyntaxLiteral, path::Int...) + # Somewhat awkward way to prevent macros from ever seeing the special + # K"hygienic_scope" expression. + # + # We could avoid this unwrapping if we were willing to stash the module + # inside the source instead. + s = ex.scope + e = ex.tree + for i in path + e = e[i] + while kind(e) == K"hygienic_scope" + s = e[2] + e = e[1] + end + end + SyntaxLiteral(s, e) +end + +head(ex::SyntaxLiteral) = head(ex.tree) +span(ex::SyntaxLiteral) = span(ex.tree) +Base.getindex(ex::SyntaxLiteral, i::Integer) = child(ex, i) +Base.lastindex(ex::SyntaxLiteral) = lastindex(ex.tree) + +# TODO: Should this return val without a scope, or should it return a GlobalRef? +# TODO: Decide between this and the 0-arg getindex +valueof(ex::SyntaxLiteral) = valueof(ex.tree) +valueof(ex::SyntaxNode) = ex.val + +function Base.getindex(ex::SyntaxLiteral) + val = ex.tree.val + if kind(ex) == K"Identifier" + GlobalRef(ex.scope, val) + else + val + end +end + +function Base.show(io::IO, mime::MIME"text/plain", ex::SyntaxLiteral) + print(io, "SyntaxLiteral") + if !isnothing(ex.scope) + print(io, " in ", ex.scope.mod, " ", ex.scope.is_global ? "macro" : "macrocall", " scope") + else + print(io, " without scope") + end + print(io, ":\n") + show(io, mime, ex.tree) +end + +function _syntax_literal(scope, expr) + # The copy here should do similar to `copyast` ? + SyntaxLiteral(scope, copy(expr)) +end + +function SyntaxNode(ex::SyntaxLiteral) + if isnothing(ex.scope) + ex.tree + else + SyntaxNode(K"hygienic_scope", ex.tree, [ex.tree, SyntaxNode(K"Value", ex.scope)]) + end +end + +struct MacroContext + macroname::SyntaxNode + mod::Module + # TODO: For warnings, we could have a diagnostics field here in the macro + # context too? Or maybe macros could just use @warn for that? +end + +#------------------------------------------------------------------------------- + +struct MacroExpansionError + context::Union{Nothing,MacroContext} + diagnostics::Diagnostics +end + +function MacroExpansionError(context::Union{Nothing,MacroContext}, + ex::Union{SyntaxNode,SyntaxLiteral}, msg::String; kws...) + diagnostics = Diagnostics() + emit_diagnostic(diagnostics, ex; error=msg, kws...) + MacroExpansionError(context, diagnostics) +end + +function MacroExpansionError(diagnostics::Diagnostics) + MacroExpansionError(nothing, diagnostics) +end + +function MacroExpansionError(ex::Union{SyntaxNode,SyntaxLiteral}, msg::String; kws...) + MacroExpansionError(nothing, ex, msg; kws...) +end + +function Base.showerror(io::IO, exc::MacroExpansionError) + print(io, "MacroExpansionError") + ctx = exc.context + if !isnothing(ctx) + print(io, " while expanding ", ctx.macroname, + " in module ", ctx.mod) + end + print(io, ":\n") + show_diagnostics(io, exc.diagnostics) +end + +function emit_diagnostic(diagnostics::Diagnostics, ex::SyntaxNode; before=false, after=false, kws...) + # TODO: Do we really want this diagnostic representation? Source ranges are + # flexible, but it seems we loose something by not keeping the offending + # expression `ex` somewhere? + # + # An alternative could be to allow diagnostic variants TextDiagnostic and + # TreeDiagnostic or something? + r = range(ex) + if before + r = first(r):first(r)-1 + elseif after + r = last(r)+1:last(r) + end + diagnostic = Diagnostic(first(r), last(r); kws...) + push!(diagnostics, (ex.source, diagnostic)) +end + +function emit_diagnostic(diagnostics::Diagnostics, ex::SyntaxLiteral; kws...) + emit_diagnostic(diagnostics, ex.tree; kws...) +end + +#------------------------------------------------------------------------------- +function _wrap_interpolation(parent_scope, parent_ex, x) + if x isa SyntaxLiteral + x.scope == parent_scope ? x.tree : SyntaxNode(x) + elseif x isa Symbol + # Presume that plain Symbols are variable names in the scope + # they're interpolated into. These exist in `scope` so don't depend + # on `same_scope`. + SyntaxNode(K"Identifier", parent_ex, x) + else + SyntaxNode(K"Value", parent_ex, x) + end +end + +function _make_syntax_node(scope, srcref, children...) + if kind(srcref) == K"$" + # Special case for interpolations without a parent as in :($x) + @assert length(children) == 1 + return SyntaxLiteral(scope, _wrap_interpolation(scope, srcref, children[1])) + end + cs = SyntaxNode[] + for c in children + push!(cs, _wrap_interpolation(scope, srcref, c)) + end + sr = srcref isa SyntaxLiteral ? srcref.tree : srcref + SyntaxLiteral(scope, SyntaxNode(head(srcref), sr, cs)) +end + +function contains_active_interp(ex, depth) + k = kind(ex) + if k == K"$" && depth == 0 + return true + end + + inner_depth = k == K"quote" ? depth + 1 : + k == K"$" ? depth - 1 : + depth + return any(contains_active_interp(c, inner_depth) for c in children(ex)) +end + +function expand_quasiquote_content(mod, ex, depth) + if !contains_active_interp(ex, depth) + # TODO: Should we do this lowering here as a part of macro expansion? + # Or would it be neater to lower to an intermediate AST form instead, + # with lowering to actual calls to _syntax_literal in "lowering + # proper"? Same question further down... + return SyntaxNode(K"call", ex, + SyntaxNode[ + SyntaxNode(K"Value", ex, _syntax_literal), + SyntaxNode(K"Value", ex, ScopeSpec(mod, true)), + SyntaxNode(K"Value", ex, ex) + ]) + end + + # We have an interpolation deeper in the tree somewhere - expand to an + # expression + inner_depth = kind(ex) == K"quote" ? depth + 1 : + kind(ex) == K"$" ? depth - 1 : + depth + expanded_children = SyntaxNode[] + for e in children(ex) + if kind(e) == K"$" && inner_depth == 0 + append!(expanded_children, children(e)) + else + push!(expanded_children, expand_quasiquote_content(mod, e, inner_depth)) + end + end + + return SyntaxNode(K"call", ex, SyntaxNode[ + SyntaxNode(K"Value", ex, _make_syntax_node), + SyntaxNode(K"Value", ex, ScopeSpec(mod, true)), + SyntaxNode(K"Value", ex, ex), + expanded_children... + ]) +end + +function expand_quasiquote(mod, ex) + if kind(ex) == K"$" + if kind(ex[1]) == K"..." + # TODO: Don't throw here - provide diagnostics instead + error("`...` expression outside of call") + else + r = SyntaxNode(K"call", ex, SyntaxNode[ + SyntaxNode(K"Value", ex, _make_syntax_node), + SyntaxNode(K"Value", ex, ScopeSpec(mod, true)), + SyntaxNode(K"Value", ex, ex), + ex[1] + ]) + return r + end + end + expand_quasiquote_content(mod, ex, 0) +end + +function needs_expansion(ex) + k = kind(ex) + if (k == K"quote") || k == K"macrocall" + return true + elseif k == K"module" # || k == K"inert" ??? + return false + else + return any(needs_expansion, children(ex)) + end +end + +function macroexpand(mod::Module, ex::SyntaxNode) + k = kind(ex) + if !haschildren(ex) || k == K"inert" || k == K"module" || k == K"meta" + return ex + elseif k == K"quote" + return macroexpand(mod, expand_quasiquote(mod, ex[1])) + elseif k == K"hygienic_scope" + scope = valueof(ex[2]) + result = macroexpand(scope.mod, ex[1]) + return SyntaxNode(SyntaxLiteral(scope, result)) + elseif k == K"macrocall" + macname = ex[1] + macfunc = eval2(mod, macname) + new_call_arg_types = + Tuple{MacroContext, ntuple(_->SyntaxNode, numchildren(ex)-1)...} + if hasmethod(macfunc, new_call_arg_types, world=Base.get_world_counter()) + margs = [SyntaxLiteral(ScopeSpec(mod, false), e) + for e in children(ex)[2:end]] + ctx = MacroContext(macname, mod) + expanded = try + invokelatest(macfunc, ctx, margs...) + catch exc + if exc isa MacroExpansionError + # Add context to the error + rethrow(MacroExpansionError(ctx, exc.diagnostics)) + else + throw(MacroExpansionError(ctx, ex, "Error expanding macro")) + end + end + expanded = expanded isa SyntaxLiteral ? + SyntaxNode(expanded) : + SyntaxNode(K"Value", ex, expanded) + result = macroexpand(mod, expanded) + return result + else + # Attempt to invoke as an old-style macro + result = Base.macroexpand(mod, Expr(ex)) + return SyntaxNode(K"Value", ex, result) + end + else + return SyntaxNode(head(ex), ex, [macroexpand(mod, c) for c in children(ex)]) + end +end + +function macroexpand(ex::SyntaxLiteral) + macroexpand(ex.scope.mod, ex.tree) +end + +#------------------------------------------------------------------------------- + +function _needs_lowering(ex) + if !haschildren(ex) + return false + elseif kind(ex) == K"macro" + return true + else + return any(_needs_lowering, children(ex)) + end +end + +# Custom lowering using SyntaxNode, before we pass to Julia's normal lowering +function lower(mod, ex) + if !_needs_lowering(ex) + return ex + end + cs = map(e->lower(mod, e), children(ex)) + if kind(ex) == K"macro" + # Special lowering for new-style macros :-) + macname = Symbol("@", ex[1][1].val) + callex = ex[1] + callex_cs = copy(children(callex)) + callex_cs[1] = SyntaxNode(K"Identifier", callex_cs[1], macname) + insert!(callex_cs, 2, + SyntaxNode(K"::", callex, [ + SyntaxNode(K"Identifier", callex, :__context__) + SyntaxNode(K"Value", callex, MacroContext) + ])) + return SyntaxNode(K"function", ex, + [SyntaxNode(K"call", callex, callex_cs), ex[2]]) + end + SyntaxNode(head(ex), ex, map(e->lower(mod, e), children(e))) +end + +function expand(mod, ex) + ex = macroexpand(mod, ex) + lower(mod, ex) +end + +# Insert Expr(:esc) expressions to escape any `(scope ex nothing)` expressions +# to the outer containing scope. +function _fix_scopes!(ex, depth) + if !(ex isa Expr) + return ex + end + ex::Expr + if ex.head == :hygienic_scope + scope = ex.args[2] + if scope.is_global + return Expr(Symbol("hygienic-scope"), + _fix_scopes!(ex.args[1], depth + 1), + scope.mod) + else + x = ex.args[1] + for i=1:depth + x = esc(x) + end + return x + end + else + map!(e->_fix_scopes!(e, depth), ex.args, ex.args) + return ex + end +end + +function expand(::Type{Expr}, mod, ex) + _fix_scopes!(Expr(expand(mod, ex)), 0) +end + +#------------------------------------------------------------------------------- +function _can_eval(ex) + k = kind(ex) + if !haschildren(ex) || k == K"quote" || k == K"inert" + return true + elseif k == K"module" + # Can't handle modules inside blocks... + return false + else + return all(_can_eval, children(ex)) + end +end + +function eval2(mod, ex::SyntaxNode) + k = kind(ex) + result = nothing + if k == K"toplevel" + for e in children(ex) + result = eval2(mod, e) + end + elseif k == K"module" + std_imports = !has_flags(ex, BARE_MODULE_FLAG) + newmod = Base.eval(mod, Expr(:module, std_imports, ex[1].val, Expr(:block))) + if std_imports + # JuliaSyntax-specific imports + Base.eval(newmod, quote + using JuliaSyntax: @__EXTENSIONS__ + eval(x::$SyntaxLiteral) = $eval2(x) + end) + end + stmts = children(ex[2]) + first_stmt = 1 + if !isempty(stmts) && kind(stmts[1]) == K"macrocall" && + valueof(stmts[1][1]) == Symbol("@__EXTENSIONS__") + result = eval2(newmod, stmts[1]) + first_stmt += 1 + if get_extension(newmod, :new_macros, false) && std_imports + # Override include() for the module + Base.eval(newmod, :(include(path) = $(JuliaSyntax.include2)($newmod, path))) + end + end + for e in stmts[first_stmt:end] + result = eval2(newmod, e) + end + else + if get_extension(mod, :new_macros, false) + @assert _can_eval(ex) + # NB: Base throws LoadError with a misleading line in this + # implementation of eval (which doesn't include LineNumberNodes which + # are normally a part of :toplevel or :module Expr's). + # Best fix: remove LoadError! Alternative fix: add line numbers... + e = expand(Expr, mod, ex) + else + e = Expr(ex) + end + result = Base.eval(mod, e) + end + return result +end + +function eval2(ex::SyntaxLiteral) + eval2(ex.scope.mod, ex.tree) +end + +function include2(mod, filename) + path, prev = Base._include_dependency(mod, filename) + code = read(path, String) + tls = task_local_storage() + tls[:SOURCE_PATH] = path + try + return include_string(mod, code; filename=path) + finally + if prev === nothing + delete!(tls, :SOURCE_PATH) + else + tls[:SOURCE_PATH] = prev + end + end +end + +function include_string(mod, str; filename=nothing) + eval2(mod, parseall(SyntaxNode, str; filename=filename)) +end + +_extensions_var = Symbol("##__EXTENSIONS__") + +function set_extension(mod::Module; kws...) + if !isdefined(mod, _extensions_var) + Base.eval(mod, :(const $(_extensions_var) = $(Dict{Symbol,Any}()))) + end + d = getfield(mod, _extensions_var) + for kv in kws + push!(d, kv) + end +end + +function get_extension(mod::Module, key::Symbol, default=nothing) + while true + if isdefined(mod, _extensions_var) + d = getfield(mod, _extensions_var) + if haskey(d, key) + return d[key] + end + end + pmod = parentmodule(mod) + if pmod == mod + break + end + mod = pmod + end + return default +end + +""" + @__EXTENSIONS__ new_macros=true +""" +macro __EXTENSIONS__(exs...) + kvs = Any[] + for e in exs + @assert Meta.isexpr(e, :(=), 2) + @assert e.args[1] isa Symbol + push!(kvs, Expr(:kw, e.args[1], esc(e.args[2]))) + end + # TODO: Expand to an `Expr(:meta)` in the future? + :(set_extension(@__MODULE__(), $(kvs...))) +end + diff --git a/src/match.jl b/src/match.jl new file mode 100644 index 00000000..bbba3799 --- /dev/null +++ b/src/match.jl @@ -0,0 +1,202 @@ +#------------------------------------------------------------------------------- +# Musings about constructing and matching syntax trees. +# +# Essentially, we want to be able match on the `head()`. + +# What if we had macros to construct expression trees for cases where +# expression literals aren't ideal? +# +# Maybe we need these for pattern matching anyway? But... child ordering is +# then implied in the API?? We want to avoid this? +# +# Syntax Ideas +# +# Rich syntax? `head => [args ...]` style +# +# @SyntaxNode ref=ex break_block => [ +# Identifier => :loop_exit, +# _while => [ +# $(expand_condition(ex[1])) +# break_block => [ +# Identifier => :loop_cont +# scope_block => $(blockify(expand_forms(ex[2])) +# ] +# ] +# ] +# +# Function call style +# +# @syntax ref=ex break_block( +# Identifier => :loop_exit, +# _while( +# $(expand_condition(ex[1])), +# break_block( +# Identifier => :loop_cont, +# scope_block( +# $(blockify(expand_forms(ex[2]))) +# ) +# ) +# ) +# ) +# +# S-expression style +# +# @syntax ref=ex [break_block +# Identifier => :loop_exit +# [_while +# $(expand_condition(ex[1])) +# [break_block +# Identifier => :loop_cont +# [scope_block +# $(blockify(expand_forms(ex[2])))]]]] +# +# +# Trying to avoid child ordering ... could we have properties? +# +# @syntax while => [ +# cond = $cond +# body = $body +# ] +# +# We'd want symmetry so the following works? +# +# ex = make_some_node(...) +# @info "condition is" ex.cond +# @info "body is" ex.body +# +# For pattern matching, syntax exactly mirroring the constructor would be (a) +# traditional and (b) mean learning only one syntax is required +# +# What about tree components where the children really are an array? In that +# case specifically allow accessing a `children` field? Or `args` field? +# `block` is naturally like this. (Disallow this in other cases though!? +# Implicit child ordering should not be in the API!) +# +# What about predicates matching the head? +# +# @match ex begin +# while => (cond=$x body=$y) begin +# # `x` and `y` are bound here?? +# end +# block => (children=$x) begin +# # `x` is child list here +# end +# $pred => begin +# # tested with ismatch(ex, pred) by pattern compiler?? +# end +# _ => begin +# # default case +# # What is bound here? We want a binding for the whole expression? +# end +# end +# +# Generically, the idea here is that ... +# +# @match x begin +# a ~ (q=$u, r=$v) => begin +# body1 +# end +# $pred ~ (q=$u) => begin +# body2 +# end +# _ => begin +# body3 +# end +# end +# +# compiles down to something like ... +# +# if tagmatch(x, matcher(typeof(x), :a)) +# u = matchfield(x, :q) +# v = matchfield(x, :r) +# body1 +# elseif tagmatch(x, matcher(typeof(x), pred)) +# u = matchfield(x, :q) +# body2 +# else +# body3 +# end +# +# The point of this lowering is that stuff like `matcher(typeof(x), :a)` can +# probably be constant folded ... `tagmatch(tag, matcher(typeof(x), pred))` +# would end up as `pred(tag)` +# +# Should the `a` and `b` be quoted or unquoted by default?? It's often just so +# damn convenient for them to be quoted ... but then if there's tags which +# aren't valid syntax like K"." ... well that's annoying hey?! You don't want +# to have to write $(K".") ugh! + +matcher(::Type{SyntaxNode}, sym::Symbol) = convert(Kind, string(sym)) +matcher(::Type{SyntaxNode}, k::Kind) = k + +function tagmatch(ex::SyntaxNode, k::Kind) + kind(ex) == k +end + +@noinline function field_not_found(ex, sym) + throw(ArgumentError("Field $sym not found in expression of kind $(kind(ex))")) +end + +function matchfield(ex::SyntaxNode, sym::Symbol) + k = kind(ex) + if sym === :children + k == K"block" ? children(ex) : field_not_found(ex, sym) + elseif sym === :condition + k == K"while" ? ex[1] : field_not_found(ex, sym) + elseif sym === :body + k == K"while" ? ex[2] : field_not_found(ex, sym) + else + field_not_found(ex, sym) + end +end + +macro match(x, pattern_block) + @assert Meta.isexpr(pattern_block, :block) + conditions = [] + bodies = [] + for pattern in pattern_block.args + pattern isa LineNumberNode && continue + @assert Meta.isexpr(pattern, :call) + unpacked = [] + if pattern.args[1] == :~ + tag_pattern = pattern.args[2] + a3 = pattern.args[3] + @assert Meta.isexpr(a3, :call) && a3.args[1] == :(=>) + unpack = a3.args[2] + @assert Meta.isexpr(unpack, :tuple) + for x in unpack.args + @assert Meta.isexpr(x, :(=)) + field_name = x.args[1] + @assert field_name isa Symbol + @assert Meta.isexpr(x.args[2], :$) + var_name = x.args[2].args[1] + @assert var_name isa Symbol + push!(unpacked, :($(esc(var_name)) = matchfield(x, $(QuoteNode(field_name))))) + end + body = a3.args[3] + elseif pattern.args[1] == :(=>) + tag_pattern = pattern.args[2] + body = pattern.args[3] + else + @assert false "Bad match pattern $pattern" + end + push!(conditions, :(tagmatch(x, matcher(x_type, $(esc(tag_pattern)))))) + push!(bodies, :( + let + $(unpacked...) + $(esc(body)) + end + )) + end + if_chain = nothing + for (c,b) in Iterators.reverse(zip(conditions, bodies)) + if_chain = Expr(:elseif, c, b, if_chain) + end + if_chain = Expr(:if, if_chain.args...) + quote + x = $(esc(x)) + x_type = typeof(x) + $if_chain + end +end + diff --git a/src/parse_stream.jl b/src/parse_stream.jl index fcd35e31..8ed0185c 100644 --- a/src/parse_stream.jl +++ b/src/parse_stream.jl @@ -81,6 +81,8 @@ struct SyntaxHead flags::RawFlags end +SyntaxHead(k::Kind) = SyntaxHead(k, EMPTY_FLAGS) + kind(head::SyntaxHead) = head.kind """ diff --git a/src/syntax_tree.jl b/src/syntax_tree.jl index a2df524d..6a06c98c 100644 --- a/src/syntax_tree.jl +++ b/src/syntax_tree.jl @@ -3,6 +3,15 @@ abstract type AbstractSyntaxData end +# TODO: +# +# Investigate attributes in ECS form and immutable trees. +# Key advantages of immutable trees: +# * Leaves are stored inline +# * No need for to ever do "copyast" +# Key advantages of ECS: +# * Multiple attributes without changing the concrete data structure + mutable struct TreeNode{NodeData} # ? prevent others from using this with NodeData <: AbstractSyntaxData? parent::Union{Nothing,TreeNode{NodeData}} children::Union{Nothing,Vector{TreeNode{NodeData}}} @@ -37,13 +46,76 @@ end const AbstractSyntaxNode = TreeNode{<:AbstractSyntaxData} +# There's two ways this can arise: +# 1. From parsing a source file. +# 2. Programmatically struct SyntaxData <: AbstractSyntaxData - source::SourceFile - raw::GreenNode{SyntaxHead} + head::SyntaxHead + source::Union{Nothing,SourceFile} + raw::Union{Nothing,GreenNode{SyntaxHead}} position::Int val::Any end +function SyntaxData(source::SourceFile, raw::GreenNode{SyntaxHead}, + position::Int, val::Any) + SyntaxData(head(raw), source, raw, position, val) +end + +# SyntaxData constructed "in code" +function SyntaxData(head::SyntaxHead, val::Any; srcref=nothing) + if isnothing(srcref) + SyntaxData(head, nothing, nothing, 0, val) + else + SyntaxData(head, srcref.source, srcref.raw, srcref.position, val) + end +end + +function SyntaxData(kind::Kind, val::Any; kws...) + SyntaxData(SyntaxHead(kind, EMPTY_FLAGS), val; kws...) +end + +# Design musings +# +# getproperty overloading for virtual fields +# +# Yeeeah this is a very inefficient way to do it. It asks a lot of the compiler +# to elide all the branches here. *Especially* eliding the branch on the kind +# seems difficult. +# Pattern matching ftw tbh +# +# function Base.getproperty(node::SyntaxNode, name::Symbol) +# # Uuugh yea we can't have this (fixme). Maybe virtual underscore-named fields? +# name === :parent && return getfield(node, :parent) +# name === :children && return getfield(node, :children) +# name === :head && return getfield(node, :data).head +# name === :source && return getfield(node, :data).source +# name === :raw && return getfield(node, :data).raw +# name === :position && return getfield(node, :data).position +# name === :val && return getfield(node, :data).val +# +# h = head(node) +# k = kind(node) +# if name === :name +# if k == K"call" +# if is_infix_op_call(h) || is_postfix_op_call(h) +# node[2] +# else +# node[1] +# end +# end +# elseif name === :args +# end +# end +# +# Could SyntaxNode be a sum type? And if it were, could we have it as optimal +# as it currently is? Weell... it'd probably be less optimal because things +# like the try node have many children. So the sizeof the sum type would be +# quite large. Also trivia really fucks us over and we can't use sum types for +# GreenNode at the very least. +# +# getproperty though, maybe? + """ SyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead}; keep_parens=false, position::Integer=1) @@ -59,6 +131,25 @@ end Base.show(io::IO, ::ErrorVal) = printstyled(io, "✘", color=:light_red) +function SyntaxNode(head::Union{Kind,SyntaxHead}, children::Vector{SyntaxNode}; + srcref=nothing) + SyntaxNode(nothing, children, SyntaxData(head, nothing; srcref=srcref)) +end + +function SyntaxNode(head::Union{Kind,SyntaxHead}, val::Any; + srcref=nothing) + SyntaxNode(nothing, nothing, SyntaxData(head, val; srcref=srcref)) +end + +function SyntaxNode(head::Union{Kind,SyntaxHead}, srcref::SyntaxNode, + children::Vector{SyntaxNode}) + SyntaxNode(nothing, children, SyntaxData(head, nothing; srcref=srcref)) +end + +function SyntaxNode(head::Union{Kind,SyntaxHead}, srcref::SyntaxNode, val::Any) + SyntaxNode(nothing, nothing, SyntaxData(head, val; srcref=srcref)) +end + function SyntaxNode(source::SourceFile, raw::GreenNode{SyntaxHead}; keep_parens=false, position::Integer=1) GC.@preserve source begin @@ -103,6 +194,7 @@ end haschildren(node::TreeNode) = node.children !== nothing children(node::TreeNode) = (c = node.children; return c === nothing ? () : c) +numchildren(node::TreeNode) = (isnothing(node.children) ? 0 : length(node.children)) """ @@ -111,12 +203,26 @@ children(node::TreeNode) = (c = node.children; return c === nothing ? () : c) Get the [`SyntaxHead`](@ref) of a node of a tree or other syntax-related data structure. """ -head(node::AbstractSyntaxNode) = head(node.raw) +head(node::AbstractSyntaxNode) = head(node.data) + +span(node::AbstractSyntaxNode) = span(node.data) + +first_byte(node::AbstractSyntaxNode) = first_byte(node.data) +last_byte(node::AbstractSyntaxNode) = last_byte(node.data) -span(node::AbstractSyntaxNode) = span(node.raw) -first_byte(node::AbstractSyntaxNode) = node.position -last_byte(node::AbstractSyntaxNode) = node.position + span(node) - 1 +# TODO: Deprecate these as they rely on the field names of AbstractSyntaxData? +head(data::AbstractSyntaxData) = head(data.raw) +span(data::AbstractSyntaxData) = span(data.raw) +first_byte(data::AbstractSyntaxData) = data.position +last_byte(data::AbstractSyntaxData) = data.position + span(data) - 1 +source_line(data::AbstractSyntaxData) = source_line(data.source, data.position) +source_location(data::AbstractSyntaxData) = source_location(data.source, data.position) + +head(data::SyntaxData) = data.head +span(data::SyntaxData) = isnothing(data.raw) ? 0 : span(data.raw) +first_byte(data::SyntaxData) = data.position +last_byte(data::SyntaxData) = data.position + span(data) - 1 """ sourcetext(node) @@ -131,18 +237,23 @@ function Base.range(node::AbstractSyntaxNode) (node.position-1) .+ (1:span(node)) end -source_line(node::AbstractSyntaxNode) = source_line(node.source, node.position) -source_location(node::AbstractSyntaxNode) = source_location(node.source, node.position) +source_line(node::AbstractSyntaxNode) = source_line(node.data) +source_location(node::AbstractSyntaxNode) = source_location(node.data) + +filename(node::AbstractSyntaxNode) = filename(node.data) +function filename(data::SyntaxData) + isnothing(data.source) ? "" : data.source.filename +end -function interpolate_literal(node::SyntaxNode, val) - @assert kind(node) == K"$" - SyntaxNode(node.source, node.raw, node.position, node.parent, true, val) +function source_location(data::SyntaxData) + return isnothing(data.source) ? (0,0) : + source_location(data.source, data.position) end function _show_syntax_node(io, current_filename, node::AbstractSyntaxNode, indent, show_byte_offsets) - fname = node.source.filename - line, col = source_location(node.source, node.position) + fname = filename(node) + line, col = source_location(node) posstr = "$(lpad(line, 4)):$(rpad(col,3))│" if show_byte_offsets posstr *= "$(lpad(first_byte(node),6)):$(rpad(last_byte(node),6))│" @@ -220,7 +331,7 @@ function Base.copy(node::TreeNode) end # shallow-copy the data -Base.copy(data::SyntaxData) = SyntaxData(data.source, data.raw, data.position, data.val) +Base.copy(data::SyntaxData) = SyntaxData(data.head, data.source, data.raw, data.position, data.val) function build_tree(::Type{SyntaxNode}, stream::ParseStream; filename=nothing, first_line=1, keep_parens=false, kws...) diff --git a/test/macroexpand.jl b/test/macroexpand.jl new file mode 100644 index 00000000..c0aa521d --- /dev/null +++ b/test/macroexpand.jl @@ -0,0 +1,160 @@ +module A + +@__EXTENSIONS__ new_macros=true + +module LocationMacros + using JuliaSyntax + + macro __MODULE__() + __context__.mod + end + + macro __FILE__() + JuliaSyntax.filename(__context__.macroname) + end + + macro __LINE__() + JuliaSyntax.source_line(__context__.macroname) + end + + macro __COLUMN__() + c = JuliaSyntax.source_location(__context__.macroname)[2] + return JuliaSyntax.kind(__context__.macroname) == K"MacroName" ? c - 1 : c + end + + function loc() + (mod=@__MODULE__(), file=@__FILE__(), line=@__LINE__(), column=@__COLUMN__()) + end +end + +using JuliaSyntax +using JuliaSyntax: valueof, MacroExpansionError, emit_diagnostic + +module B + x = "x in B" + + macro g() + "in @g" + end + + macro f(y) + quote + (x, $y, @g) + end + end +end + +z = "z in A" + +function hygiene_test() + B.@f z +end + +macro smallpow(ex) + @assert kind(ex) == K"call" + @assert JuliaSyntax.is_infix_op_call(ex) + @assert valueof(ex[2]) == :^ + N = valueof(ex[3]) + @assert N isa Integer + e = ex[1] + for i = 2:N + e = :($e * $(ex[1])) + end + return e +end + +macro smallpow_wrapped(ex) + quote + @smallpow $ex + end +end + +macro error_test(body) + if kind(body) == K"tuple" + error("\"Unexpected\" error") + elseif kind(body) != K"block" + throw(MacroExpansionError(body, "Expected a `begin ... end` block")) + end + + # A way to do more complicated diagnostics: + diagnostics = JuliaSyntax.Diagnostics() + for e in body + if kind(e) != K"call" + emit_diagnostic(diagnostics, e, error="Expected call") + end + end + if !isempty(diagnostics) + throw(MacroExpansionError(nothing, diagnostics)) + end +end + +function macro_calling_macro_test(x) + @smallpow_wrapped x^3 +end + +function old_macro_test(x) + @evalpoly x 1 2 +end + +function bad_macro_invocation(case) + if case == 1 + :(@error_test (1,2)) + elseif case == 2 + :(@error_test begin + a+b + [1,2,3] + f(x,y) + z + end) + else + :(@error_test function foo() + do_stuff + end) + end +end + +macro letx(arg) + quote + let x = 42 + $arg, x + end + end +end + +y = let x = 84 + @letx x +end + +JuliaSyntax.include2(@__MODULE__, "macroexpand_ccall.jl") + +function call_strlen(str) + CCall.@ccall strlen(str::Cstring)::Csize_t +end + +end + +@testset "macroexpand" begin + loc = A.LocationMacros.loc() + @test loc.mod == A.LocationMacros + @test last(splitpath(loc.file)) == "macroexpand.jl" + @test loc.line == 26 + @test loc.column == 72 + + @test A.hygiene_test() == ("x in B", "z in A", "in @g") + + @test A.old_macro_test(1) == 3 + @test A.old_macro_test(2) == 5 + + @test A.macro_calling_macro_test(3) == 27 + + @test_throws JuliaSyntax.MacroExpansionError A.eval(A.bad_macro_invocation(1)) + @test_throws JuliaSyntax.MacroExpansionError A.eval(A.bad_macro_invocation(2)) + @test_throws JuliaSyntax.MacroExpansionError A.eval(A.bad_macro_invocation(3)) + + @test A.y == (84, 42) + + # FIXME: Cannot call CCall.@ccall directly from here without opting into + # new macro expansion. Need some Base hooks for reimplementing include() to + # make the opt-in more subtle and well integrated. + @test A.call_strlen("ab - cd") == 7 +end diff --git a/test/macroexpand_ccall.jl b/test/macroexpand_ccall.jl new file mode 100644 index 00000000..bd040a6f --- /dev/null +++ b/test/macroexpand_ccall.jl @@ -0,0 +1,125 @@ +module CCall + +# An implementation of the ccall macro with new macro expansion + +@__EXTENSIONS__ new_macros=true + +using JuliaSyntax +using JuliaSyntax: SyntaxLiteral, ScopeSpec, MacroExpansionError, is_identifier, numchildren, children + +function ccall_macro_parse(ex) + if kind(ex) != K"::" + throw(MacroExpansionError(ex, "Expected a return type annotation like `::T`", after=true)) + end + rettype = ex[2] + call = ex[1] + if kind(call) != K"call" + throw(MacroExpansionError(call, "Expected function call syntax `f()`")) + end + + # get the function symbols + func = let f = call[1], kf = kind(f) + if kf == K"." + :((:($$(f[2])), $(f[1]))) + elseif kf == K"$" + f + elseif is_identifier(kf) + SyntaxLiteral(K"inert", f, [f]) + else + throw(MacroExpansionError(f, + "Function name must be a symbol like `foo`, a library and function name like `libc.printf` or an interpolated function pointer like `\$ptr`")) + end + end + + varargs = nothing + + # collect args and types + args = SyntaxLiteral[] + types = SyntaxLiteral[] + + function pusharg!(arg) + if kind(arg) != K"::" + throw(MacroExpansionError(arg, "argument needs a type annotation like `::T`", after=true)) + end + push!(args, arg[1]) + push!(types, arg[2]) + end + + varargs = nothing + num_varargs = 0 + for e in Iterators.drop(children(call), 1) # FIXME this is ugly ... + if kind(e) == K"parameters" + num_varargs == 0 || throw(MacroExpansionError(e, "Multiple parameter blocks not allowed")) + num_varargs = numchildren(e) + num_varargs > 0 || throw(MacroExpansionError(e, "C ABI prohibits vararg without one required argument")) + varargs = children(e) + else + pusharg!(e) + end + end + if !isnothing(varargs) + for e in varargs + pusharg!(e) + end + end + + return func, rettype, types, args, num_varargs +end + +function ccall_macro_lower(ex, convention, func, rettype, types, args, num_varargs) + statements = SyntaxLiteral[] + if kind(func) == K"$" + check = quote + func = $(func[1]) + if !isa(func, Ptr{Cvoid}) + name = :($(func[1])) + throw(ArgumentError("interpolated function `$name` was not a Ptr{Cvoid}, but $(typeof(func))")) + end + end + func = check[1][1] + push!(statements, check) + end + + roots = SyntaxLiteral[] + cargs = SyntaxLiteral[] + for (i, (type, arg)) in enumerate(zip(types, args)) + # FIXME: Need utility function for identifiers, which are no longer + # plain symbols? Or can we upgrade interpolation code to detect this + # case and attribute them to the interpolation location? + # argi = @Identifier("arg$i") + # argi = Identifier(@__MODULE__, "arg$i") + # Which means Symbol("arg$i") with the current module ?? + argi = Symbol("arg$i") + # TODO: Can we use SSAValue here? Lowering can do this ... but + # presumably that implies some invariants? + push!(statements, :(local $argi = Base.cconvert($type, $arg))) + push!(roots, :($argi)) + push!(cargs, :(Base.unsafe_convert($type, $argi))) + end + push!(statements, SyntaxLiteral(K"foreigncall", + ex, + SyntaxLiteral[func, + rettype, + :(Core.svec($(types...))), + :($num_varargs), + # TODO: Gosh constructing this quoted + # symbol was too hard to get correct. We + # need something better. + SyntaxLiteral(K"inert", ex, [:($convention)]), + cargs..., + roots... + ])) + quote + $(statements...) + end +end + +macro ccall(ex) + ccall_macro_lower(ex, Symbol("ccall"), ccall_macro_parse(ex)...) +end + +# @ccall printf("%s = %d\n"::Cstring; "var"::Cstring, 42::Cint)::Cint +# nothing + +end + diff --git a/test/runtests.jl b/test/runtests.jl index bf2f93fb..31fcaf92 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -29,6 +29,10 @@ include("expr.jl") end include("source_files.jl") +if VERSION >= v"1.9" + JuliaSyntax.include2(@__MODULE__, "macroexpand.jl") +end + if VERSION >= v"1.6" # Tests restricted to 1.6+ due to # * Core._parse hook doesn't exist on v1.5 and lower