Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: make macros bind tighter than commas #36547

Open
MasonProtter opened this issue Jul 5, 2020 · 30 comments
Open

RFC: make macros bind tighter than commas #36547

MasonProtter opened this issue Jul 5, 2020 · 30 comments
Labels
breaking This change will break code parser Language parsing and surface syntax
Milestone

Comments

@MasonProtter
Copy link
Contributor

MasonProtter commented Jul 5, 2020

Currently, we have that @foo a, b, c parses as @foo((a, b, c)) which seems sensible enough, but it's a major source of frustration when you have a macro inside a function's arguments. As a concrete example, consider a very simple underscore anonymous function macro that would take @_ f(_) + 1 and turn it into x -> f(x) + 1. With the current parsing of macros, if we write

map(@_ f(_) + 1, xs)

this gets parsed as

map(@_((f(_) + 1, xs)))

rather than the desired

map(@_(f(_) + 1), xs)

I'm wondering if anyone has a feeling for how disruptive it would be to change this in 2.0 and if they feel that such a change would be desirable. I'm not sure I've ever seen someone write @foo a, b, c where they didn't intend it to mean (@foo(a), b, c), but this could be a reflection of my ignorance.

Is there some deep and useful reason for the current behaviour that I'm missing?

@JeffBezanson JeffBezanson changed the title RFC: make macros bind tighter than tuples RFC: make macros bind tighter than commas Jul 5, 2020
@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label Jul 5, 2020
@MasonProtter
Copy link
Contributor Author

MasonProtter commented Jul 6, 2020

c.c. @c42f, I remember discussing this with you on Slack, but that conversation is now lost beyond the Slack horizon.

@c42f
Copy link
Member

c42f commented Jul 6, 2020

I agree this would be great, thanks for taking the time to capture this into an issue. As far as I'm aware there's no compelling reason for the current parsing, and I haven't personally seen anyone relying on it in the wild.

@mcabbott
Copy link
Contributor

mcabbott commented Jul 6, 2020

It's a pity that dot(@view A[1:3], B) doesn't just work. Accepting a tuple and an expression @foo (a, b) expr seems useful sometimes, e.g. for @tensoropt. Brackets are not currently required, but I presume they would be, to treat @foo a, b, c differently from @foo (a, b, c). And likewise @foo expr (a, b) differently from @foo expr a, b?

@MasonProtter
Copy link
Contributor Author

Brackets are not currently required, but I presume they would be, to treat @foo a, b, c differently from @foo (a, b, c). And likewise @foo expr (a, b) differently from @foo expr a, b?

Good points. I think anything that currently is delimited by parens should be unchanged, but I do think @foo expr a, b should become (@foo(expr a), b) rather than @foo(expr, (a, b))

@JeffBezanson JeffBezanson added the breaking This change will break code label Jul 16, 2020
@mbauman
Copy link
Sponsor Member

mbauman commented Jul 16, 2020

I'd really like this, too. The place where I'd foresee problems is multi-line DSL macros, since the comma could be seen as a way to split macro args over multiple lines without begin or parens.

julia> :(@foo a,
              b,
              c)
:(#= REPL[2]:1 =# @foo (a, b, c))

julia> :(@foo a
              b
              c)
ERROR: syntax: missing comma or ) in argument list
Stacktrace:
 [1] top-level scope at none:1

@StefanKarpinski StefanKarpinski added this to the 2.0 milestone Jul 16, 2020
@StefanKarpinski
Copy link
Sponsor Member

I suspect this is too breaking to be done in a 1.x release, but I've marked for consideration in 2.0.

@MasonProtter
Copy link
Contributor Author

Hypothetically, if I or someone else mocked up a PR and we ran PkgEval and showed that it broke no testsuites, would we still consider this too breaking for 1.x just because of the danger to user code?

I can see myself being convinced either way that this either is or isn't acceptable for 1.x even if no testsuites broke.

@StefanKarpinski
Copy link
Sponsor Member

If it passes PkgEval, that's strong evidence that it's not likely to break user code. In that case we'd consider it.

@c42f
Copy link
Member

c42f commented Feb 20, 2023

Using parse_eq_star(ps) here

https://github.com/JuliaLang/JuliaSyntax.jl/blob/20b3b3e88841e3de0309e9a0531de0ca906b6c2e/src/parser.jl#L2651

should be approximately the right way to try this out.

I ran this change across General and found quite a lot of Julia files would be broken by this change (~2% .. which does seem like rather a lot - perhaps needs closer inspection of the exact failures)

In any case, it exposed a few interesting uses. One particular package which this would break common usage is https://github.com/mauro3/UnPack.jl eg, the usage in the readme

@unpack a, b = pa

@MasonProtter
Copy link
Contributor Author

Thanks for checking that out @c42f. Disappointing, but makes sense.

@c42f
Copy link
Member

c42f commented Feb 21, 2023

A more subtle way to fix this could, perhaps, be to turn this on only when we're in a context where commas are already special.

For example, within parentheses. So

  • f(@x a, b) would parse as (call f (macrocall @x a) b)
  • @x a, b would parse as (macrocall @x (tuple a b))

This context-sensitivity might seem bad, but it already exists in the language in precisely this kind of way because

  • f(a, b = c, d) parses as (call f a (kw b c) d), not (call f (= (tuple a b) (tuple c d)))
  • But a, b = c, d parses as (= (tuple a b) (tuple c d))

@MasonProtter
Copy link
Contributor Author

🤯 wow I never considered that.

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Feb 21, 2023

So thinking more about this, since we have Expr(:..., ), it’s actually more expressive for a macro to slurp up an unwanted part of the expression because it can just splat that back out if it doesn’t want it.

However, this currently does cause problems if used outside of a call, so I think implementing #48738 would solve my concerns.

@MasonProtter
Copy link
Contributor Author

Actually, nevermind, I think I do actually want what you're describing here @c42f because a macro can't tell the difference between @foo x, y and @foo (x, y).

@c42f
Copy link
Member

c42f commented Feb 21, 2023

I prototyped this more restricted approach. Applying it to Base is interesting - there's three differences in base/**/*.jl, all look like like unintentional scoping of the @inbounds macros to consume a tuple rather than a single indexing operation. Not incorrect, but misleading:

# base/array.jl

iterate(A::Array, i=1) = (@inline; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)
#                                                                ^^^^^^^^^^^^^^^^^^^^^


# base/dict.jl
    vals = T <: KeySet ? v.dict.keys : v.dict.vals
    (@inbounds vals[i], i == typemax(Int) ? 0 : i+1)
#    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# base/strings/basic.jl
IndexStyle(::Type{<:CodeUnits}) = IndexLinear()
@inline iterate(s::CodeUnits, i=1) = (i % UInt) - 1 < length(s) ? (@inbounds s[i], i + 1) : nothing
#                                                                  ^^^^^^^^^^^^^^^^^^^^^

Conversely in the tests, there's the following cases which look intentional but could easily and more clearly be rewritten to use parentheses

# test/bitarray.jl
    r1 = func(args...; kwargs...)
    ret_type  nothing && (@test isa(r1, ret_type) || @show ret_type, typeof(r1))
#                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# test/fastmath.jl
    for T in (Float32,Float64)
        for func in (@fastmath exp2,exp,exp10)
#                    ^^^^^^^^^^^^^^^^^^^^^^^^
            @test func(T(2000)) == T(Inf)


# test/llvmcall2.jl
    @test (@eval ccall("llvm.floor.f64", llvmcall, Float64, (Float64,), 0.0)) === 0.0
#          vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
    @test (@eval ccall("llvm.floor", llvmcall, Float64, (Float64,), 0.0),
                 ccall("llvm.floor", llvmcall, Float32, (Float32,), 0.0)) === (0.0, 0.0f0)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    @test_throws err @eval ccall("llvm.floor.f64", llvmcall, Float32, (Float64,), 0.0)

@MasonProtter
Copy link
Contributor Author

Hmm, interesting. Those usages in the tests do seem to indicate to me that the more restrictive form of this could be breaking, but I guess we'd have to look out in the wider ecosystem to know for sure.

@c42f
Copy link
Member

c42f commented Mar 1, 2023

I ran JuliaSyntax over my copy of General with the rule that the scope only
changes within parentheses. I found several weird things, some of which may be
bugs in user code, many cases which are just unnecessary or visually ambiguous.
And a few cases where this would be breaking.

Overall, I'd say these results suggest that this syntax doesn't work like
people expect and changing it would be a good idea. But it will definitely be
mildly breaking.

I've attached the full log of syntax mismatches here pkg_log_36547.txt. (Note that about 10 are unrelated bugs/differences in JuliaSyntax.)

Possible bugs, which this change would "un-break"

# ClimateMachine_0.1.0/src/Arrays/MPIStateArrays.jl

weighted_dot_impl(Q1, Q2, W) = dot_impl(@~ @. W * Q1, Q2)
                               #------------------------<

# Oceananigans_0.69.2/examples/kelvin_helmholtz_instability.jl

Ri_plot = plot(@. Ri * sech(zF / h)^2 / sech(zF)^2, zF; xlabel="Ri(z)", color=:black, kwargs...) # Ri(z)= ∂_z B / (∂_z U)²; derivatives computed by hand
          #------------------------------------------------------------------------------------<

# WebIO_0.8.16/test/render.jl
        @eval struct $TypeBName end
        TypeA, TypeB = (@eval $TypeAName, @eval $TypeBName)
                       #----------------------------------<

Non-breaking

Unintentional but harmless scope

misleading scope for @inbounds occurs many times, this is just one example

# AMDGPU_0.2.17/src/device/array.jl
    if (i % UInt) - 1 < length(A)
        (@inbounds A[i], i + 1)
        #---------------------<

# Stuffing_0.8.3/src/qtree_functions.jl
                for j in labelslen:-1:i+1
                    push!(itemlist, (@inbounds labels[i], @inbounds labels[j]) => spindex)
                                    #----------------------------------------<

# Zygote_0.6.34/src/lib/grad.jl
    back = MacroTools.@q _ -> ($__source__; nothing)
    push!(blk.args, :(@inline Zygote._pullback(::Context, ::Core.Typeof($(esc(f))), args...) = $(esc(f))(args...), $back))
                     #--------------------------------------------------------------------------------------------------<

Breaking

Unnecessary parentheses

# BridgeSDEInference_0.3.2/src/solvers/tsit5.jl
function tsit5(f, t, y, dt, P, tableau)
    #------------------------------------------------------------------------->
    (@unpack c₁,c₂,c₃,c₄,c₅,c₆,a₂₁,a₃₁,a₃₂,a₄₁,a₄₂,a₄₃,a₅₁,a₅₂,a₅₃,a₅₄,a₆₁,a₆₂,
             a₆₃,a₆₄,a₆₅,a₇₁,a₇₂,a₇₃,a₇₄,a₇₅,a₇₆ = tableau)
#---------------------------------------------------------<

Bizarre style

A few people seem to looove short form function syntax and short-form block notation and will put up with writing piles of extra semicolons to use it.

# ConstrainedRootSolvers_0.1.3/src/find_zero.jl
) where {FT<:AbstractFloat} =
#>
(
    # _count iterations
    _count = 0;

    # define the initial step
    @unpack x_ini, x_max, x_min, Δ_ini = ms;

    # initialize the y
    _tar_x = x_ini;
    _tar_y = abs( f(_tar_x) );

    # record the history
    if stepping
        push!(ms.history, [_tar_x, _tar_y]);
    end;
……………
        # 3. if break
        if if_break(tol, FT(0), _Δx, _tar_y, _count)
            # record the history
            if stepping
                push!(ms.history, [_tar_x, _tar_y]);
            end;

            break
        end;

        # 4. if no update, then 10% the Δx
        if _count_inc + _count_dec == 0
            _Δx /= 10;
        end;
    end;

    return _tar_x
)
#<

Valid but misleading

# ConvexBodyProximityQueries_0.2.0/test/obstacles.jl
    @test typeof(@point rand(3)) <: ConvexPolygon{3, 1}
    @test typeof(@line rand(2), rand(2)) <: ConvexPolygon{2, 2}
          #----------------------------<

# Compose_0.9.3/src/property.jl
svgattribute(attributes::AbstractArray, values::AbstractArray) =
    #-------------------------------------------------->
    SVGAttribute( @makeprimitives SVGAttributePrimitive,
            (attribute in attributes, value in values),
            SVGAttributePrimitive(attribute, string(value)))
#----------------------------------------------------------<

# MatrixNetworks_1.0.2/test/diffusions_test.jl
        tol = 1e-3
        x = pagerank_power!(x,y,P,0.85,v,tol,maxiter,(iter,x) -> @show iter, norm(x,1))
            #-------------------------------------------------------------------------<

# ModelingToolkit_8.3.2/test/variable_parsing.jl

@test @macroexpand(@parameters x, y, z(t)) == @macroexpand(@parameters x y z(t))
      #----------------------------------<

Somewhat useful cases

But these can be fixed with parens

# OrdinaryDiffEq_6.6.5/src/nordsieck_utils.jl
  isconstcache = T <: OrdinaryDiffEqConstantCache
  isconstcache || ( @unpack atmp, ratetmp = cache )
                  #-------------------------------<

# RRRMC_2.2.0/src/graphs/PercStep.jl

    (e1-e0) == delta || (@show e1,e0,delta,e1-e0; error())
                        #--------------------------------<

@JeffBezanson
Copy link
Sponsor Member

Darn, this does seem like a mistake. Maybe we could do a release with a warning for this, and eventually change it?

Summary for triage: in f(@m a, b), the macro call consumes the comma, and usage data shows that that is probably surprising to many people. We would like to terminate macro parsing on a comma when inside an argument list.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Mar 1, 2023
@LilithHafner
Copy link
Member

It would also be nice to change the scoping in TypeA, TypeB = (@eval $TypeAName, @eval $TypeBName), but this would be more breaking (e.g. @show e1,e0,delta,e1-e0; error())

@c42f
Copy link
Member

c42f commented Mar 2, 2023

Agreed @LilithHafner, I'd like to include changing the parsing of (@m a, b) for consideration by triage. Because parens constructing a tuple or named tuple is very analogous to the function call syntax. And I think this is more common than short-form block syntax. (Actually I can quantify this for the code in General, but it'll take me a little while to set up.)

So there's several possible proposals for making macro calls bind tighter than commas, ordered from least to most breaking:

  1. Only within function calls, ie, change f(@m a, b)
  2. Within all parentheses, ie, change function calls and also forms like (@m a, b). (RFC: make macros bind tighter than commas #36547 (comment))
  3. Everywhere (RFC: make macros bind tighter than commas #36547 (comment))

We've already discarded (3) because it's very breaking and the natural appearance of @unpack et al shows it's undesirable.

I believe (2) is most consistent. But if too breaking, we could consider (1).

I also think we can do syntax evolution in a less breaking way than relying on warnings and a deprecation schedule. But that's a whole other thing I shouldn't get carried away explaining on this issue ;-)

@LilithHafner
Copy link
Member

I also think we can do syntax evolution in a less breaking way than relying on warnings and a deprecation schedule. But that's a whole other thing I shouldn't get carried away explaining on this issue ;-)

Would you share a link or post more info on this elsewhere? Your work on syntax is very nice and I'd love to hear your ideas about this.

@LilithHafner
Copy link
Member

From triage: Long term, implement option 2. Short term, warn whenever this would change parsing. We can implement the warning now, no need to wait for JuliaSyntax to merge.

@c42f
Copy link
Member

c42f commented Mar 2, 2023

Excellent decision imo. Thanks triage ❤️

@MasonProtter
Copy link
Contributor Author

So how do we move forward with this? I looked around in parse-eq* and downstream but was having some trouble conceptualizing where we'd actually want to intercept the macrocall and how to do the warning.

@c42f
Copy link
Member

c42f commented Mar 3, 2023

@MasonProtter I've pushed my JuliaSyntax prototype implementation here for reference: JuliaLang/JuliaSyntax.jl#212. You can see there's a new ParseState variable low_precedence_comma. In JuliaSyntax, ParseState holds a set of variables which are dynamically scoped in the scheme implementation. So there's a fairly direct analogy. However, in terms of functions, note that JuliaSyntax has a new system for parsing parentheses and disambiguating the highly overloaded syntax inside them. So you will find there's some differences in that part of the code.

Warnings are a little tricky in the flisp parser as there's no obvious place to communicate them. Maybe the simplest, least disruptive way to get them out of the scheme code would be to use a dynamically scoped list to collect warnings from the parsing functions. Then modify the entry points called from the C code to return the warnings as a separate output. Ah wait. I see this is what we're already doing now for lowering. So following that code should be easy enough:

https://github.com/JuliaLang/julia/blob/master/src/jlfrontend.scm#L158

I'd like a way for JuliaSyntax to be able to communicate warnings in future, separately from the parse tree. Currently this isn't possible with Core._parse. So it would be nice to modify the API of the Core._parse hook in some way to be able to communicate warnings and other diagnostics in general. One option could be to invent some Expr(:diagnostic) data structure for this. I'm not sure that's best but it'd be flexible and easy on the flisp<-->C side.

@c42f
Copy link
Member

c42f commented Mar 3, 2023

By the way, I'm not super keen to mess with the scheme implementation anymore, I'd rather put that effort into landing JuliaSyntax. But I'm happy to support anyone else in making changes to the scheme code. And help with any changes to the Julia runtime which would benefit both parsers.

@c42f
Copy link
Member

c42f commented Mar 4, 2023

Diagnostics design sketch

Ok. How about the following sketch for how diagnostics could flow through the parser and runtime:

  1. Change Core._parse API such that, instead of returning svec(ex, last_offset), it returns a svec(ex, last_offset, diagnostics). Note that in general, compiler diagnostics do not need to align with the tree structure returned in ex. So having them in a separate list is more flexible. See also the Expr(:diagnostics) design alternative below
  2. Figure out the API for the diagnostics data structure. Probably show and some kind of other summary function (Meta.diagnostic_summary? - see discussion below)
  3. Modify the Meta.parse() machinery for the new Core._parse, in some way that it can (optionally I guess, for compatibility?) return diagnostics to the caller. In JuliaSyntax.parse() we have ignore_warnings
  4. Change jlfrontend.scm and julia-parser.scm to use a dynamically scoped list to collect warnings during parsing, as mentioned in a previous comment. Change the runtime C function fl_parse to deal with the diagnostics list returned.
  5. Change the use of Meta.parse within Base and the runtime to request diagnostics and report them in some way. For the use in loading.jl in include/include_string, presumably they would go via the logging system, as we're already running Julia code there. Same for the use of parseall in client.jl and parse_input_line in the REPL. (side note - parse_input_line() predates parseall() but in principle should just be replaced with parseall())
  6. Users may still call into the C runtime functions jl_eval_string / jl_parse_string / jl_parse_all from arbitrary C code. We could report warnings arising from those to stderr? Or just ignore them? It seems like an edge case we could ignore.

Design alternative with Expr(:diagnostics)

A possible alternative to returning the diagnostics separately from Core._parse is to encode them into the expression tree, somewhat similarly to how Expr(:error) is done. However, I think it best to encode them in the root of the tree as Expr(:diagnostics, ex, diagnostics), as they're not part of the tree structure per se. This is different from how Expr(:error) currently gets tacked onto the end of the Expr(:toplevel) args list.

Then report any diagnostics which exist at eval() time. This might be neater and more easy to make compatible than threading a separate diagnostics data structure around everywhere - implementing will tell. The advantages of this approach is that it could allow us to have warnings on in Meta.parse() by default. And also that there's a chance that they could survive flowing through user code (including macros) and back into eval(). (But tbh I'm not sure these features are entirely necessary... needs thought.)

Source Diagnostics data structures / API

One straightforward concrete option would be a Vector of (level, first_byte, last_byte, message) where level can be :error, :warn (or maybe :info) - first_byte:last_byte, refers to a source range of the input text, and message is a string. This is basically just the JuliaSyntax.Diagnostic data structure.

A challenge here is I don't think JuliaSyntax.Diagnostic is "finished" and I'd like flexibility to expand it in the future. For example, adding

  • Sub-ranges for highlighting and additional message structure
  • Continue to work on pretty printing. Including
    • Printing for mime types other than text
    • Formatting multiple diagnostics on a single line in a way which only prints the source line once.

So a better option than requiring a particular data structure for diagnostics is probably to rely on generic functions - minimal, just enough for the runtime to query the diagnostics data structure as necessary. show(io, diagnostics) being an obvious one for pretty printing. But perhaps also an analogue of JuliaSyntax.any_error() to summarize the diagnostics. Maybe it could be Meta.diagnostic_summary or some such so we can distinguish between errors and warnings. (Not sure if it needs to be in Core? Needs investigation.)

@c42f
Copy link
Member

c42f commented May 12, 2023

I've updated the code at JuliaLang/JuliaSyntax.jl#212 to add this behavior as a feature flag which can be toggled at runtime to make testing this easier. And added the code which tests against the General registry there.

I also enabled this change within vect brackets and that didn't seem to cause any additional breaking changes. That is:

julia> parsestmt(SyntaxNode, "[a, @x b, c]", low_precedence_comma_in_brackets=true)
line:col│ tree                                   │ file_name
   1:1  │[vect]
   1:2  │  a
   1:5  │  [macrocall]
   1:6@x
   1:8  │    b
   1:11 │  c

julia> parsestmt(SyntaxNode, "[a, @x b, c]", low_precedence_comma_in_brackets=false)
line:col│ tree                                   │ file_name
   1:1  │[vect]
   1:2  │  a
   1:5  │  [macrocall]
   1:6@x
   1:7  │    [tuple]
   1:8  │      b
   1:11 │      c

Also, here's another example of a fun bug (I guess?!) in General due to confusion about the current parsing rules:

┌ Error: Parsers succeed but disagree
│   fpath = "BesselK_0.1.0/paperscripts/testing/matern_derivative_speed.jl"
│   failing_source =const oo = @SVector ones(2)
│    const zz = @SVector zeros(2)
│    #                      ┌───────────────────────────const TESTING_PARAMS = (@SVector [1.25, 0.25, 0.5],
│                            @SVector [1.25, 5.25, 0.5],
│                            @SVector [1.25, 0.25, 0.75],
│                            @SVector [1.25, 5.25, 0.75],
│                            @SVector [1.25, 0.25, 1.0],
│                            @SVector [1.25, 5.25, 1.0],
│                            @SVector [1.25, 0.25, 1.75],
│                            @SVector [1.25, 5.25, 1.75],
│                            @SVector [1.25, 0.25, 3.0],
│                            @SVector [1.25, 5.25, 3.0],
│                            @SVector [1.25, 0.25, 4.75],
│                            @SVector [1.25, 5.25, 4.75])
│    #──────────────────────────────────────────────────┘
│    
│    for pp in TESTING_PARAMS

@MasonProtter
Copy link
Contributor Author

Amazing, thanks for working in this @c42f! I had spent some time digging into the scheme and was left suitably discouraged that I agree we should just focus on JuliaSyntax.jl as well.

Your work on it is awesome, as usual.

@c42f
Copy link
Member

c42f commented May 14, 2023

Thanks so much @MasonProtter 😊 I know you've got plenty of your own projects but if you ever feel interested in contributing over at JuliaSyntax there's lots to do :-)

by the way I've copied the comment at #36547 (comment) out of this thread and over to JuliaLang/JuliaSyntax.jl#276 and will probably start working on something like that as part of the JuliaSyntax integration soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code parser Language parsing and surface syntax
Projects
None yet
Development

No branches or pull requests

8 participants