Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unoptimized code generation with tuple arguments #6670

Closed
mlubin opened this issue Apr 27, 2014 · 5 comments
Closed

unoptimized code generation with tuple arguments #6670

mlubin opened this issue Apr 27, 2014 · 5 comments

Comments

@mlubin
Copy link
Member

mlubin commented Apr 27, 2014

Consider:

function f1(values)
    s1 = values[1]
    s2 = values[2]
    s3 = 2*s2
    s4 = s1 + s3
    return s4
end


function f2(input)
    values = input[1]
    s1 = values[1]
    s2 = values[2]
    s3 = 2*s2
    s4 = s1 + s3
    return s4
end

Type inference is essentially the same for both:

julia> code_typed(f1, (Vector{Float64},))
1-element Array{Any,1}:
 :($(Expr(:lambda, {:values}, {{:s1,:s2,:s3,:s4},{{:values,Array{Float64,1},0},{:s1,Float64,18},{:s2,Float64,18},{:s3,Float64,18},{:s4,Float64,18}},{}}, :(begin  # none, line 2:
        s1 = top(arrayref)(values::Array{Float64,1},1)::Float64 # line 3:
        s2 = top(arrayref)(values::Array{Float64,1},2)::Float64 # line 4:
        s3 = top(box)(Float64,top(mul_float)(top(box)(Float64,top(sitofp)(Float64,2))::Float64,s2::Float64))::Float64 # line 5:
        s4 = top(box)(Float64,top(add_float)(s1::Float64,s3::Float64))::Float64 # line 6:
        return s4::Float64
    end::Float64))))

julia> code_typed(f2, ((Vector{Float64},),))
1-element Array{Any,1}:
 :($(Expr(:lambda, {:input}, {{:values,:s1,:s2,:s3,:s4},{{:input,(Array{Float64,1},),0},{:values,Array{Float64,1},18},{:s1,Float64,18},{:s2,Float64,18},{:s3,Float64,18},{:s4,Float64,18}},{}}, :(begin  # none, line 2:
        values = top(tupleref)(input::(Array{Float64,1},),1)::Array{Float64,1} # line 3:
        s1 = top(arrayref)(values::Array{Float64,1},1)::Float64 # line 4:
        s2 = top(arrayref)(values::Array{Float64,1},2)::Float64 # line 5:
        s3 = top(box)(Float64,top(mul_float)(top(box)(Float64,top(sitofp)(Float64,2))::Float64,s2::Float64))::Float64 # line 6:
        s4 = top(box)(Float64,top(add_float)(s1::Float64,s3::Float64))::Float64 # line 7:
        return s4::Float64
    end::Float64))))

but very different llvm code is generated:

julia> code_llvm(f1, (Vector{Float64},))

define double @julia_f116865(%jl_value_t*) {
...
idxend2:                                          ; preds = %idxend
  %7 = getelementptr inbounds %jl_value_t* %0, i64 1, i32 0, !dbg !1047
  %8 = load %jl_value_t** %7, align 8, !dbg !1047, !tbaa %jtbaa_arrayptr
  %9 = bitcast %jl_value_t* %8 to double*, !dbg !1047
  %10 = load double* %9, align 8, !dbg !1047, !tbaa %jtbaa_user
  %11 = getelementptr %jl_value_t* %8, i64 1, !dbg !1053
  %12 = bitcast %jl_value_t* %11 to double*, !dbg !1053
  %13 = load double* %12, align 8, !dbg !1053, !tbaa %jtbaa_user
  %14 = fmul double %13, 2.000000e+00, !dbg !1056
  %15 = fadd double %10, %14, !dbg !1057
  ret double %15, !dbg !1058
}

julia> code_llvm(f2, ((Vector{Float64},),))

define %jl_value_t* @julia_f216867(%jl_value_t*, %jl_value_t**, i32) {
...
pass:                                             ; preds = %top
  %17 = getelementptr inbounds %jl_value_t* %12, i64 2, i32 0, !dbg !1054
  %18 = load %jl_value_t** %17, align 8, !dbg !1054
  store %jl_value_t* %18, %jl_value_t** %10, align 8, !dbg !1059
  store %jl_value_t* inttoptr (i64 23413456 to %jl_value_t*), %jl_value_t** %11, align 8, !dbg !1059
  %19 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 39256272 to %jl_value_t*), %jl_value_t** %10, i32 2), !dbg !1059
  store %jl_value_t* %19, %jl_value_t** %4, align 8, !dbg !1059
  store %jl_value_t* %18, %jl_value_t** %10, align 8, !dbg !1060
  store %jl_value_t* inttoptr (i64 23413488 to %jl_value_t*), %jl_value_t** %11, align 8, !dbg !1060
  %20 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 39256272 to %jl_value_t*), %jl_value_t** %10, i32 2), !dbg !1060
  store %jl_value_t* %20, %jl_value_t** %7, align 8, !dbg !1060
  store %jl_value_t* inttoptr (i64 23413488 to %jl_value_t*), %jl_value_t** %10, align 8, !dbg !1061
  store %jl_value_t* %20, %jl_value_t** %11, align 8, !dbg !1061
  %21 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 35192416 to %jl_value_t*), %jl_value_t** %10, i32 2), !dbg !1061
  store %jl_value_t* %21, %jl_value_t** %8, align 8, !dbg !1061
  store %jl_value_t* %19, %jl_value_t** %10, align 8, !dbg !1062
  store %jl_value_t* %21, %jl_value_t** %11, align 8, !dbg !1062
  %22 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 40835744 to %jl_value_t*), %jl_value_t** %10, i32 2), !dbg !1062
  store %jl_value_t* %22, %jl_value_t** %9, align 8, !dbg !1062
  %23 = load %jl_value_t** %5, align 8, !dbg !1063
  %24 = getelementptr inbounds %jl_value_t* %23, i64 0, i32 0, !dbg !1063
  store %jl_value_t** %24, %jl_value_t*** @jl_pgcstack, align 8, !dbg !1063
  ret %jl_value_t* %22, !dbg !1063
}

Why does the second version generate unoptimized code?

@JeffBezanson
Copy link
Sponsor Member

Planning to address this. Generally, we don't specialize on all tuple types because there are just too many of them. Due to the code in inference.jl the number is actually unbounded if we aren't careful.

@mlubin
Copy link
Member Author

mlubin commented Apr 27, 2014

Ok thanks, got a nice speedup from restructuring the code to avoid this.

@timholy
Copy link
Sponsor Member

timholy commented Apr 27, 2014

@mlubin, in case you can't wait, Jeff once mentioned a nice trick to me: write the signature as

function f2{T}(input::T)

That will force specialization. Whether this helps is quite specific for tuple inputs; in general, it's still true that in the vast majority of cases there's no performance advantage for declaring input types.

@mlubin
Copy link
Member Author

mlubin commented Apr 28, 2014

Feel free to close if this is subsumed by another issue.

@simonster
Copy link
Member

I think the closest issue is #4090, but it's closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants