Export dotu with generic fallbacks #8300

dlfivefifty · 2014-09-10T22:48:39Z

I have the following inconsistency in Julia v0.3.0:

julia> BLAS.dotu([1.+im,2.],[3.,4.+im])
11.0 + 5.0im

julia> BLAS.dotu([1.,2.],[3.,4.])
ERROR: dotu has no method matching dotu(::Int64, ::Array{Float64,1}, ::Int64, ::Array{Float64,1}, ::Int64)
in dotu at linalg/blas.jl:156

The text was updated successfully, but these errors were encountered:

simonster · 2014-09-10T23:14:43Z

dotu is only defined for complex arguments, both in BLAS and in Julia. If you want to compute a dot product of real vectors, you can just use dot.

dlfivefifty · 2014-09-10T23:16:09Z

What if you want to right a function that uses dot for real arguments and dotu for complex?  (I.e., always do dotu)

On 11 Sep 2014, at 9:14 am, Simon Kornblith notifications@github.com wrote:

dotu is only defined for complex arguments, both in BLAS and in Julia. If you want to compute a dot product of real vectors, you can just use dot.

—
Reply to this email directly or view it on GitHub.

johnmyleswhite · 2014-09-10T23:16:50Z

Write a wrapper that gets inlined to each of them?

dlfivefifty · 2014-09-10T23:19:26Z

Yes of course, but I can’t see a good reason why dotu(Real,Real) != dot(Real,Real)    

not to mention dotu(Complex,Real) and dotu(Real,Complex)

On 11 Sep 2014, at 9:17 am, John Myles White notifications@github.com wrote:

Write a wrapper that gets inlined to each of them?

—
Reply to this email directly or view it on GitHub.

simonster · 2014-09-10T23:22:21Z

Because BLAS.dotu is a wrapper for BLAS ?dotu. If we were to create an exported dotu function it ought to do this and also have a pure Julia fallback for non-BLAS element types. Such a function currently does not exist, but it might make sense to have.

dlfivefifty · 2014-09-10T23:25:40Z

OK That’s fair enough.  I’d be in favour of the addition of an exported dotu ..  its a commonly used operation and dotu(u,v) looks cleaner than dot(conj(u),v) or (u.’*v)[1]

On 11 Sep 2014, at 9:22 am, Simon Kornblith notifications@github.com wrote:

Because BLAS.dotu is a wrapper for BLAS ?dotu. If we were to create an exported dotu function it ought to do this and also have a pure Julia fallback for non-BLAS element types. Such a function currently does not exist, but it might make sense to have.

—
Reply to this email directly or view it on GitHub.

jiahao · 2014-09-10T23:30:24Z

I think of dotu as semantically a special case of .*; we could just add the BLAS calls as special methods of .* called on StridedArray{<:BlasFloat}s.

timholy · 2014-09-10T23:36:52Z

Out of curiosity, are the BLAS versions any faster than the Julia versions? At least for Float32, the gap has narrowed a lot thanks to SIMD.

simonster · 2014-09-10T23:37:04Z

@jiahao It's really sum(x.*y) or (x.'y)[1], though, isn't it? It would be great if we could just make that call dotu, but we don't have the machinery to do that.

jiahao · 2014-09-10T23:44:09Z

Oh right, duh.

simonster · 2014-09-10T23:48:53Z

@timholy That's worth testing. @andreasnoack said that the difference wasn't big, but at the moment the generic version doesn't use @simd which would be necessary to let LLVM reassociate the adds, so it's probably at least a bit slower. The loop vectorizer was also not that great at optimizing summation the last time I tried it (#6928), but I never got around to trying with LLVM trunk after @ArchRobison fixed vectorization there.

andreasnoack · 2014-09-11T02:10:30Z

@dlfivefifty I vaguely remember a discussion of dotu some time ago. The result was that I wrote a wrapper for dotu. A google search revealed that the person asking was actually you. Hence, it appears that you are the only person who have asked for that function so far. I don't think we want to export another function unless the demand goes up. Hopefully, x.'y will return a scalar at some point, such the you can get a right working dotu without another export. (Right now x.'y also seems to be much slower that BLAS.dotu)

@simonster The Julia implementation is faster for small n and Complex{Float64} although the crossing point is rather low. I get for native Julia:

n = 50, time = 0.336
n = 100, time = 0.298
n = 150, time = 0.276
n = 200, time = 0.268
n = 250, time = 0.262
n = 300, time = 0.258
n = 350, time = 0.257
n = 400, time = 0.256
n = 450, time = 0.252
n = 500, time = 0.252
n = 550, time = 0.252
n = 600, time = 0.251
n = 650, time = 0.250
n = 700, time = 0.249
n = 750, time = 0.249
n = 800, time = 0.251
n = 850, time = 0.248
n = 900, time = 0.248
n = 950, time = 0.247
n = 1000, time = 0.247

And for BLAS

n = 50, time = 0.643
n = 100, time = 0.341
n = 150, time = 0.223
n = 200, time = 0.189
n = 250, time = 0.109
n = 300, time = 0.099
n = 350, time = 0.090
n = 400, time = 0.087
n = 450, time = 0.081
n = 500, time = 0.073
n = 550, time = 0.069
n = 600, time = 0.069
n = 650, time = 0.067
n = 700, time = 0.062
n = 750, time = 0.065
n = 800, time = 0.061
n = 850, time = 0.060
n = 900, time = 0.056
n = 950, time = 0.058
n = 1000, time = 0.055

Off course the native Julia implementation can be made much faster for all n if it only uses the first element of the vectors. 😃

dlfivefifty · 2014-09-11T02:14:47Z

It’s definitely low priority as there are easy work arounds, but  I can’t be the only person who wants to multiply and sum the entries of 2 vectors that are possibly complex :P

On 11 Sep 2014, at 12:10 pm, Andreas Noack notifications@github.com wrote:

@dlfivefifty I vaguely remember a discussion of dotu some time ago. The result was that I wrote a wrapper for dotu. A google search revealed that the person asking was actually you. Hence, it appears that you are the only person who have asked for that function so far. I don't think we want to export another function unless the demand goes up. Hopefully, x.'y will return a scalar at some point, such the you can get a right working dotu without another export. (Right now x.'y also seems to be much slower that BLAS.dotu)

@simonster The Julia implementation is faster for small n and Complex{Float64} although the crossing point is rather low. I get for native Julia:

n = 50, time = 0.336
n = 100, time = 0.298
n = 150, time = 0.276
n = 200, time = 0.268
n = 250, time = 0.262
n = 300, time = 0.258
n = 350, time = 0.257
n = 400, time = 0.256
n = 450, time = 0.252
n = 500, time = 0.252
n = 550, time = 0.252
n = 600, time = 0.251
n = 650, time = 0.250
n = 700, time = 0.249
n = 750, time = 0.249
n = 800, time = 0.251
n = 850, time = 0.248
n = 900, time = 0.248
n = 950, time = 0.247
n = 1000, time = 0.247
And for BLAS

n = 50, time = 0.643
n = 100, time = 0.341
n = 150, time = 0.223
n = 200, time = 0.189
n = 250, time = 0.109
n = 300, time = 0.099
n = 350, time = 0.090
n = 400, time = 0.087
n = 450, time = 0.081
n = 500, time = 0.073
n = 550, time = 0.069
n = 600, time = 0.069
n = 650, time = 0.067
n = 700, time = 0.062
n = 750, time = 0.065
n = 800, time = 0.061
n = 850, time = 0.060
n = 900, time = 0.056
n = 950, time = 0.058
n = 1000, time = 0.055
Off course the native Julia implementation can be made much faster for all n if it only uses the first element of the vectors.

—
Reply to this email directly or view it on GitHub.

simonster · 2014-09-11T02:20:59Z

@andreasnoack Some of the overhead for small n in the Complex{Float64} case may come from the fact that the BLAS wrappers allocate a 1-element array for the output. Given that, I'm actually kind of surprised the crossing point is so low. We could allocate that array outside the function, but that wouldn't be thread-safe so we risk incurring the wrath of @JeffBezanson. We might also be able to use jl_alloca.

simonster · 2014-09-11T02:25:36Z

#8134 seems like the clean way to avoid that allocation.

andreasnoack · 2014-09-11T02:34:25Z

I think you are right. For Float64 BLAS is faster even for n=50 which fits well into that explanation.

An alternative solution could possible be that we'll be able return Complex{FloatX}s from ccalls.

vtjnash · 2014-09-11T02:58:21Z

Some of the overhead for small n in the Complex{Float64} case may come from the fact that the BLAS wrappers allocate a 1-element array for the output. Given that, I'm actually kind of surprised the crossing point is so low. We could allocate that array outside the function, but that wouldn't be thread-safe so we risk incurring the wrath of @JeffBezanson. We might also be able to use jl_alloca.

If I can get Jeff to merge it soon, #8134 is intended to address exactly that observation. (it allows the compiler to automatically use alloca for exactly this use case)

KristofferC · 2017-01-26T14:23:53Z

Explicitly calling BLAS functions is an "expert" feature and they should confirm to the argument types that the BLAS functions themselves take.

simonster changed the title ~~BLAS.dotu(::Vector{Float64},::Vector{Float64}) not working~~ Export dotu and dotc with generic fallbacks Sep 10, 2014

simonster added feature speculative Whether the change will be implemented is speculative labels Sep 10, 2014

simonster changed the title ~~Export dotu and dotc with generic fallbacks~~ Export dotu with generic fallbacks Sep 10, 2014

ViralBShah removed the feature label Feb 14, 2015

KristofferC closed this as completed Jan 26, 2017

stevengj mentioned this issue Jun 5, 2017

Change the behavior of dot to not conjugate, and introduce an inner (product) function #22227

Closed

stevengj mentioned this issue May 17, 2018

State of inner products in Base #25565

Closed

ranocha mentioned this issue Jun 20, 2018

Add unconjugated dot product dotu #27677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export dotu with generic fallbacks #8300

Export dotu with generic fallbacks #8300

dlfivefifty commented Sep 10, 2014

simonster commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

johnmyleswhite commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

simonster commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

jiahao commented Sep 10, 2014

timholy commented Sep 10, 2014

simonster commented Sep 10, 2014

jiahao commented Sep 10, 2014

simonster commented Sep 10, 2014

andreasnoack commented Sep 11, 2014

dlfivefifty commented Sep 11, 2014

simonster commented Sep 11, 2014

simonster commented Sep 11, 2014

andreasnoack commented Sep 11, 2014

vtjnash commented Sep 11, 2014

KristofferC commented Jan 26, 2017

Export dotu with generic fallbacks #8300

Export dotu with generic fallbacks #8300

Comments

dlfivefifty commented Sep 10, 2014

simonster commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

johnmyleswhite commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

simonster commented Sep 10, 2014

dlfivefifty commented Sep 10, 2014

jiahao commented Sep 10, 2014

timholy commented Sep 10, 2014

simonster commented Sep 10, 2014

jiahao commented Sep 10, 2014

simonster commented Sep 10, 2014

andreasnoack commented Sep 11, 2014

dlfivefifty commented Sep 11, 2014

simonster commented Sep 11, 2014

simonster commented Sep 11, 2014

andreasnoack commented Sep 11, 2014

vtjnash commented Sep 11, 2014

KristofferC commented Jan 26, 2017