speed up expansion and lowering of ccall macro #50077

JeffBezanson · 2023-06-06T04:32:26Z

Test case:

julia> ex = quote
           function cublasCher2_v2_64(handle, uplo, n, alpha, x, incx, y, incy, A, lda)
               @ccall libcublas.cublasCher2_v2_64(handle::cublasHandle_t, uplo::cublasFillMode_t,
                                               n::Int64, alpha::RefOrCuRef{cuComplex},
                                               x::CuPtr{cuComplex}, incx::Int64,
                                               y::CuPtr{cuComplex}, incy::Int64,
                                               A::CuPtr{cuComplex}, lda::Int64)::cublasStatus_t
           end
       end

julia> @btime Meta.lower(Main, ex)

before: 1.33ms after: 0.680ms.
I do like the idea of @ccall producing a foreigncall expression, but there are two problems leading to noticeable load time differences in some packages: (1) it does some unnecessary work forming strings to make temporary identifier names, and (2) normal variables (slots) are more expensive to analyze than ssavalues. It would probably help if macros were somehow able to generate ssavalues. In the meantime this cuts the lowering time in half by making @ccall produce a "classic" ccall call expression (plus a new cconv Expr head to retain the ability to express calling conventions and correct varargs).

maleadt · 2023-06-06T19:21:02Z

Thanks, this does indeed improve the time it takes to lower @ccall expressions! Still remarkably slow at 0.5ms per ccall (CUDA.jl has thousands), but I won't say no to a 50% speed-up 🙂

(1) it does some unnecessary work forming strings to make temporary identifier names

Is that really significant, as benchmarking ccall_macro_lower directly only takes a couple of us vs. many hundreds of us when calling all of lowering? Or does it have a knock-on effect on later lowering?

For reference, this was the benchmark script I was using:

call = :(
    libfoo.bar(a::A, b::B, c::C, d::D, e::E)::X
)

println("ccall_macro_parse:")
x = Base.ccall_macro_parse(call)
@benchmark Base.ccall_macro_parse(call)
# 230 ns

println("ccall_macro_lower:")
Base.ccall_macro_lower(:ccall, x...)
@benchmark Base.ccall_macro_lower(:ccall, x...)
# 4 us

println("the above, but via lowering:")
macro_call = :(
    @ccall $call
)
lower(ex::Expr, mod::Module=Main, file::String="", line::Int=0) =
    ccall(:jl_expand_with_loc_warn, Any, (Any, Any, Cstring, Cint), ex, mod, file, line)
lower(macro_call)
@benchmark lower(macro_call)
# 500 us

println("plain ccall")
plain_ccall = :(
    ccall((:bar, :libfoo), X, Tuple{A, B, C, D, E}, a, b, c, d, e)
)
lower(plain_ccall)
display(@benchmark lower(plain_ccall))
# 8 us

JeffBezanson · 2023-06-08T03:18:57Z

I guess not; all the cost may very well be from using "normal" identifiers (which have lots of features!) for labeling temporary values, plus the cost of macroexpansion.

speed up expansion and lowering of ccall macro

b4662db

JeffBezanson added compiler:lowering Syntax lowering (compiler front end, 2nd stage) compiler:latency Compiler latency labels Jun 6, 2023

KristofferC merged commit 75bda64 into master Jun 12, 2023

KristofferC deleted the jb/fasteratccall branch June 12, 2023 07:13

c42f mentioned this pull request Aug 7, 2023

Prototype: Macro expansion / lowering JuliaLang/JuliaSyntax.jl#329

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up expansion and lowering of ccall macro #50077

speed up expansion and lowering of ccall macro #50077

JeffBezanson commented Jun 6, 2023

maleadt commented Jun 6, 2023

JeffBezanson commented Jun 8, 2023 •

edited

Loading

speed up expansion and lowering of ccall macro #50077

speed up expansion and lowering of ccall macro #50077

Conversation

JeffBezanson commented Jun 6, 2023

maleadt commented Jun 6, 2023

JeffBezanson commented Jun 8, 2023 • edited Loading

JeffBezanson commented Jun 8, 2023 •

edited

Loading