Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of generic dot products #27678

Merged
merged 2 commits into from
Jul 11, 2018

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Jun 20, 2018

Due to some recent work on dot products in Julia (#27401 and #27677), I'm interested in the generic implementations of those functions. Here are some benchmark result using the recent master branch of Julia:

julia> using BenchmarkTools, LinearAlgebra

julia> using LinearAlgebra: _length

julia> # new generic dot product for LinearAlgebra/src/generic.jl
       function dot0(x, y)
           ix = iterate(x)
           iy = iterate(y)
           if ix == nothing
               if iy != nothing
                   throw(DimensionMismatch("x and y are of different lengths!"))
               end
               return dot(zero(eltype(x)), zero(eltype(y)))
           end
           if iy == nothing
               throw(DimensionMismatch("x and y are of different lengths!"))
           end
           s = dot(ix[1], iy[1])
           ix, iy = iterate(x, ix[2]), iterate(y, iy[2])
           while ix != nothing && iy != nothing
               s += dot(ix[1], iy[1])
               ix, iy = iterate(x, ix[2]), iterate(y, iy[2])
           end
           if !(iy == nothing && ix == nothing)
               throw(DimensionMismatch("x and y are of different lengths!"))
           end
           return s
       end
dot0 (generic function with 1 method)

julia> # generic dot product in LinearAlgebra/src/generic.jl
       function dot1(x, y)
           ix = iterate(x)
           iy = iterate(y)
           if ix === nothing
               if iy !== nothing
                   throw(DimensionMismatch("x and y are of different lengths!"))
               end
               return dot(zero(eltype(x)), zero(eltype(y)))
           end
           if iy === nothing
               throw(DimensionMismatch("x and y are of different lengths!"))
           end
           (vx, xs) = ix
           (vy, ys) = iy
           s = dot(vx, vy)
           while true
               ix = iterate(x, xs)
               iy = iterate(y, ys)
               if (ix == nothing) || (iy == nothing)
                   break
               end
               (vx, xs), (vy, ys) = ix, iy
               s += dot(vx, vy)
           end
           if !(iy == nothing && ix == nothing)
               throw(DimensionMismatch("x and y are of different lengths!"))
           end
           return s
       end
dot1 (generic function with 1 method)

julia> # dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl
       function dot2(x::AbstractArray, y::AbstractArray)
           if length(LinearIndices(x)) != length(LinearIndices(y))
               throw(DimensionMismatch("dot product arguments have unequal lengths $(length(LinearIndices(x))) and $(length(LinearIndices(y)))"))
           end
           ix = iterate(x)
           if ix === nothing
               # we only need to check the first vector, since equal lengths have been asserted
               return dot(zero(eltype(x)), zero(eltype(y)))
           end
           iy = iterate(y)
           s = dot(ix[1], iy[1])
           ix, iy = iterate(x, ix[2]), iterate(y, iy[2])
           while ix != nothing
               s += dot(ix[1], iy[1])
               ix = iterate(x, ix[2])
               iy = iterate(y, iy[2])
           end
           return s
       end
dot2 (generic function with 1 method)

julia> # dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl
       function dot3(x::AbstractArray, y::AbstractArray)
           lx = _length(x)
           if lx != _length(y)
               throw(DimensionMismatch("first array has length $(lx) which does not match the length of the second, $(_length(y))."))
           end
           if lx == 0
               return dot(zero(eltype(x)), zero(eltype(y)))
           end
           s = zero(dot(first(x), first(y)))
           for (Ix, Iy) in zip(eachindex(x), eachindex(y))
               @inbounds s += dot(x[Ix], y[Iy])
           end
           s
       end
dot3 (generic function with 1 method)

julia> function test(T, Ns...)
           x = rand(T, Ns...)
           y = rand(T, Ns...)

           display("Dot product using BLAS.")
           display(@btime dot($x, $y))

           display("New generic dot product.")
           display(@btime dot0($x, $y))

           display("Generic dot product in LinearAlgebra/src/generic.jl.")
           display(@btime dot1($x, $y))

           display("Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl.")
           display(@btime dot2($x, $y))

           display("Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl.")
           display(@btime dot3($x, $y))
       end
test (generic function with 1 method)

julia> test(Float64, 100)
"Dot product using BLAS."
  41.156 ns (0 allocations: 0 bytes)
23.49389537040023
"New generic dot product."
  207.198 ns (0 allocations: 0 bytes)
23.493895370400235
"Generic dot product in LinearAlgebra/src/generic.jl."
  5.841 μs (198 allocations: 6.19 KiB)
23.493895370400235
"Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl."
  254.789 ns (0 allocations: 0 bytes)
23.493895370400235
"Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl."
  78.806 ns (0 allocations: 0 bytes)
23.493895370400235

julia> test(Float64, 10, 10)
"Dot product using BLAS."
  31.015 ns (0 allocations: 0 bytes)
25.062379286171122
"New generic dot product."
  199.772 ns (0 allocations: 0 bytes)
25.06237928617112
"Generic dot product in LinearAlgebra/src/generic.jl."
  5.880 μs (198 allocations: 6.19 KiB)
25.06237928617112
"Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl."
  228.787 ns (0 allocations: 0 bytes)
25.06237928617112
"Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl."
  85.951 ns (0 allocations: 0 bytes)
25.06237928617112

julia> test(ComplexF64, 100)
"Dot product using BLAS."
  61.193 ns (0 allocations: 0 bytes)
43.02892041305856 + 1.2252503100128251im
"New generic dot product."
  281.028 ns (0 allocations: 0 bytes)
43.028920413058586 + 1.2252503100128243im
"Generic dot product in LinearAlgebra/src/generic.jl."
  6.067 μs (198 allocations: 6.19 KiB)
43.028920413058586 + 1.2252503100128243im
"Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl."
  443.242 ns (0 allocations: 0 bytes)
43.028920413058586 + 1.2252503100128243im
"Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl."
  118.485 ns (0 allocations: 0 bytes)
43.028920413058586 + 1.2252503100128243im

julia> test(ComplexF64, 10, 10)
"Dot product using BLAS."
  61.055 ns (0 allocations: 0 bytes)
46.18083236484251 - 0.2818033108671578im
"New generic dot product."
  261.608 ns (0 allocations: 0 bytes)
46.18083236484251 - 0.2818033108671528im
"Generic dot product in LinearAlgebra/src/generic.jl."
  6.074 μs (198 allocations: 6.19 KiB)
46.18083236484251 - 0.2818033108671528im
"Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl."
  395.637 ns (0 allocations: 0 bytes)
46.18083236484251 - 0.2818033108671528im
"Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl."
  120.597 ns (0 allocations: 0 bytes)
46.18083236484251 - 0.2818033108671528im

There seem to be some serious type stability issues for the generic dot product in LinearAlgebra/src/generic.jl. Moreover, the generic dot product for AbstractArrays seems to be superior to the one for AbstractVectors. Thus, I've added a new generic dot product and have removed the generic one for AbstractVectors.

@andreasnoack
Copy link
Member

Thanks for detecting this. However, this seems to have happened as part of the transition to the new iteration protocol. The version in Julia 0.6 doesn't have the type instability issue so I think we might want to understand why this happened.

cc: @Keno @stevengj

@@ -663,29 +648,23 @@ julia> dot(x, y)
function dot(x, y) # arbitrary iterables
ix = iterate(x)
iy = iterate(y)
if ix === nothing
if iy !== nothing
if ix == nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why == everywhere?

@haampie
Copy link
Contributor

haampie commented Jun 20, 2018

W.r.t. dot1, the allocations are due to:

# in dot1
if (ix == nothing) || (iy == nothing)
    break
end

It turns out to be necessary to write

# in dot1
ix === nothing && break
iy === nothing && break

It's not enough to just replace == by ===, which strikes me as odd.

The dot2 just needs that nothing-check on the second iterator, probably to convince the compiler that it is not nothing altough we claim to know. The other thing is that it is apparently faster to write the check again as two split statements, which makes the whole thing similar to dot1:

# in dot2
while true
    ix === nothing && break
    iy === nothing && break
    s += dot(ix[1], iy[1])
    ix = iterate(x, ix[2])
    iy = iterate(y, iy[2])
end

With these changes things are faster again:

julia> test(Float64, 100)
"Dot product using BLAS."
  35.675 ns (0 allocations: 0 bytes)
"New generic dot product."
  275.774 ns (0 allocations: 0 bytes)
"Generic dot product in LinearAlgebra/src/generic.jl."
  216.415 ns (0 allocations: 0 bytes)
"Dot product for `AbstractVector`s in LinearAlgebra/src/generic.jl."
  202.356 ns (0 allocations: 0 bytes)
"Dot product for `AbstractArray`s in LinearAlgebra/src/generic.jl."
  109.182 ns (0 allocations: 0 bytes)

@ranocha
Copy link
Member Author

ranocha commented Jun 21, 2018

Thank you very much, @haampie. I've replaced == by === as you suggested in the generic version of dot. However, since the AbstractArray version is still faster than the AbstractVector version, the last one should still be removed.

ranocha added a commit to ranocha/julia that referenced this pull request Jun 21, 2018
@ranocha
Copy link
Member Author

ranocha commented Jun 25, 2018

I don't think the test failures on appveyor and circleci are influenced by this PR. Should I make any additional changes in order to allow merging of this PR?

s += dot(ix[1], iy[1])
ix = iterate(x, ix[2])
iy = iterate(y, iy[2])
s = zero(dot(first(x), first(y)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the only potential downside to this implementation is that it retrieves the first element of each of x and y twice, rather than once with the explicit iterate implementations?

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm! :)

(I remain curious about the roughly factor-of-two performance difference between the two best generic implementations.)

@kshyatt kshyatt added performance Must go faster linear algebra Linear algebra labels Jul 10, 2018
@andreasnoack andreasnoack merged commit f3ad067 into JuliaLang:master Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants