-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of matrices using complex numbers #1145
Comments
There are others who can comment more fully (and possibly, accurately!), but my impression is that our complex number support is definitely slower than ideal. The issue comes down to the fact that currently Julia doesn't support "immutable types," which in this case would mean a set of two doubles in adjacent memory locations that are addressable as a type (structure). There are some clever shenanigans that simulate this (see all the "box" and "unbox" stuff in base/complex.jl), but I think there's some overhead to this. The good news is that Jeff appears to be working on this precise issue. I suspect that when he's done, you'll see a significant performance boost. I can't comment on your third case. |
As Tim has mentioned, there is the issue of immutability in structs, but I think that's not at work here. If I am interpreting my observations correctly, the actual issue is that the first and third case are inlined (though there's additional copying overhead for the third) while the second one is not (LLVM dump here ). Maybe @JeffBezanson can elaborate a little further. EDIT: I should have been more precise. What I meant is that |
See #323. But also, @loladiro is right and I think I can do something more in this case. |
Ok yes that fixed it. m = rand(1024,1024); n = sum(m); @time n = sum(m); m = m + im; n = sum(m); @time n = sum(m); So now instead of being 10 times slower it is twice faster with complex. Strange… For comparison here is why I get with Matlab 2012a: m = rand(1024,1024); n = sum(m); tic; n = sum(m); toc, m = m + 1i; n = sum(m); tic; n = sum(m); toc |
That is because we use a compensated summation algorithm by default for float arrays; it is a bit slower. I guess we should use the same algorithm for complex float. |
A bit? It is 4 times slower… On Aug 13, 2012, at 21:19, Jeff Bezanson notifications@github.com wrote:
|
For some reason I get a much smaller gap on my machine:
Also since we're 2x to 7x slower than matlab here, the differences between real and complex, compensated and not are masked by other overhead. I would also not expect complex to be 2x slower since it is unrolled, doing 2 sums per loop (e.g. in matlab it is only 1.8x slower). |
This kind of makes sense given that our theory about why compensated summation is only marginally slower is that superscalar architectures can do a few more ops in a tight loop essentially for free. That would be extremely architecture-sensitive. Maybe @cdubout's machine is a sufficiently different CPU that the extra work in the compensated version is no longer free. |
@cdubout, in case it's not entirely clear, the compensated algorithm is designed to be insensitive to roundoff. See line 1430 of array.jl, commit 19ff52a, and this wikipedia page: http://en.wikipedia.org/wiki/Kahan_summation_algorithm. Jeff and Stefan, it would seem that a poor-person's version of an improved summation routine might offer a middle course (warning, just making this up off the top of my head):
The idea is that if |
My timings on an I7-2760QM with 1333 Mhz DDR3 RAM are quite similar to @cdubout's (same code as Jeff's):
, while my timings on beowulf show are quite different:
Replacing the compensated sum with the trivial algorithm on my i7 gives results that might be closer to what is expected:
|
Makes sense. Those beowulf machines are ancient. |
Maybe we should just have a separate function |
+1 for making it the compensated sum a separate function. |
My timings were on an i7-2720QM with 1333 Mhz DDR3 RAM, makes sense that they are similar. |
The requirement for having at least two functions (a fast one and an accurate one) is going to be a general issue as we move forward. It's come up while I've been writing code for doing PCA, for example. It would be great to come up with a general naming convention for this distinction. -- John On Aug 13, 2012, at 1:48 PM, cdubout wrote:
|
One interesting approach would be to have |
There's a whole class of approximate solutions to problems (e.g., approximate nearest-neighbor, ANN), for which there's a speed/accuracy tradeoff. For those algorithms you can often specify an upper bound on how accurate you require the answer to be. Not certain that applies in this case, but if one is trying to come up with a general API it would be worth keeping this class of algorithms in mind. |
Agreed, Tim: this becomes a very deep question once we start thinking about it. In R, for example, there are, at minimum, two separate PCA functions that differ in numerical accuracy based on their use or non-use of the SVD. Neither even gets into the more complex issues raised by newer approximate PCA methods that let you tune the level of approximation. We should definitely think about this. It might be as big an issue to get right as the modeling functions like |
Bringing the discussion on exact vs. approximate summation back here (from #1257), since most of the discussion is here. For K-B-N summation, at least, it would be nice to have a In the case of K-B-N summation, at least, this is probably fine. The K-B-N summation only needs to be defined for FloatingPoints, which could otherwise use generic/templated summation functions. For other classes of functions, it might not be as easy. Shall we try this for K-B-N summation? I can update #1257 to further the discussion. (*) https://groups.google.com/forum/#!msg/julia-dev/Rj1xkrZkgcw/yb9ZlEAoKzUJ |
That sort of facility with modules would be nice. I guess the idea would be that you have normal The other way that you'd want to use this is to opt for KBN (or something even more precise) globally as an option. That's an entirely different issue, but not off-topic here, I don't think. So it seems like what we want is:
The latter indicates that summation may be something we want to load later than most of the stuff in base since we might want the option to be able to choose to load a different definition based on command-line flags. |
Nice summary! The first is a clearer explanation of what I was proposing, and the second summarizes the needs discussed previously. ( |
But there's some merit in starting the name with |
From the "what other languages do" department, here's some pseudo-Haskell-Julia: import Base hiding (sum, cumsum)
import KBNSum So basically you can exclude certain things. Pseudo-Julia, it could be nice to do: import Base.* hiding exports(KBNSum)
import KBNSum.* where |
I am trying to reimplement some Matlab code in Julia, but there is a big performance gap when using complex numbers.
As an example:
m = rand(1024,1024);
@time n = sum(m);
elapsed time: 0.007010221481323242 seconds (shortest of 5 trials on my machine)
m = m + im;
@time n = sum(m);
elapsed time: 0.07563614845275879 seconds (again shortest of 5 trials on my machine)
@time n = sum(real(m)) + sum(imag(m));
elapsed time: 0.016952991485595703 seconds
Why is the second experiment more than 10 times slower instead of just 2 (like Matlab)?
And why the third still manages to beat the second...
The text was updated successfully, but these errors were encountered: