randn performance - implement Ziggurat algorithm #96

ViralBShah · 2011-07-08T03:29:20Z

This is the julia implementation of randn, which is exactly the same as the C version. It runs about 6 times slower than the C version - even if I remove all the global variable stuff altogether.

jl_randn_next = -42.0
function randn()
    global jl_randn_next

    if (jl_randn_next != -42.0)
        s = jl_randn_next
        jl_randn_next = -42.0
        return s
    end

    s = 1.0
    vre = 0.0
    vim = 0.0
    while (s >= 1.0)
        ure = rand()
        uim = rand()
        vre = 2.0*ure - 1.0
        vim = 2.0*uim - 1.0
        s = vre*vre + vim*vim
    end

    s = sqrt(-2.0*log(s)/s)
    jl_randn_next = s * vre
    return s * vim
end

Here is the performance analysis for a million calls. I even tried using the libm log, but only sqrt made a difference. I think Jeff has to tell us whats going on. I thought we were not that much slower than C.

randn (fully in C): .06 sec
randn (fuly in Julia, with libm sqrt): 0.38 sec - 6x slower
randn (fuly in Julia, with libfdm sqrt): 0.59 sec - 10x slower

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2011-07-08T06:12:48Z

I see a 3x difference (I'll focus on the fast sqrt case, since that's a simple fix).

The sources of overhead are the function call, and boxing the return value. I tried to separate these.

times for 1e6 iterations:

C randn: 0.05454802513122559

original julia randn: 0.15975308418273926

return nothing instead of value: 0.12742090225219727

no function call (1e6 loop inside randn): 0.10554385185241699

making jl_randn_next a local variable: 0.07610011100769043

I think there is still a gap between llvm and gcc -O3 performance. In the core arithmetic part of the function, the disassembly is clean, we are generating good code. I guess I can gradually work on call overhead. And I have a planned optimization to eliminate the return value boxing when the function is called in a well-typed context. This can probably only get us to the 0.1055 range, or 2x slower.
But, long story short, there aren't any glaring performance bugs in the generated code.

StefanKarpinski · 2011-07-08T06:54:31Z

So in other words, while it would be nice for us to eventually match C here, for now we should just call the C version and be done with it.

ViralBShah · 2011-07-08T11:29:16Z

Once we move to dsfmt, this will require a bit more work. 2x slower than C is a great place to be at.

My take is that if its only 2x slower, we can leave it in julia.

-viral

On Jul 8, 2011, at 12:24 PM, StefanKarpinski wrote:

So in other words, while it would be nice for us to eventually match C here, for now we should just call the C version and be done with it.

Reply to this email directly or view it on GitHub:
#96 (comment)

StefanKarpinski · 2011-07-08T13:29:03Z

It's great that we could be at 2x the speed of C if Jeff did a bunch of fantastic optimization work overnight. That's not going to happen immediately, however, and in the meantime, we can be at 1x the speed of C just by using the C version. What's the point of not doing that?

ViralBShah · 2011-07-08T13:50:08Z

Ok, I think that the call overhead is dominating in many places. I have replaced sqrt and all rand stuff with faster versions, but don't see speedups in the randmatstat benchmark. Perhaps we will see speedups after your optimizations to call overhead.

-viral

On Jul 8, 2011, at 11:42 AM, JeffBezanson wrote:

I see a 3x difference (I'll focus on the fast sqrt case, since that's a simple fix).

The sources of overhead are the function call, and boxing the return value. I tried to separate these.

times for 1e6 iterations:

C randn: 0.05454802513122559

original julia randn: 0.15975308418273926

return nothing instead of value: 0.12742090225219727

no function call (1e6 loop inside randn): 0.10554385185241699

making jl_randn_next a local variable: 0.07610011100769043

I think there is still a gap between llvm and gcc -O3 performance. In the core arithmetic part of the function, the disassembly is clean, we are generating good code. I guess I can gradually work on call overhead. And I have a planned optimization to eliminate the return value boxing when the function is called in a well-typed context. This can probably only get us to the 0.1055 range, or 2x slower.
But, long story short, there aren't any glaring performance bugs in the generated code.

Reply to this email directly or view it on GitHub:
#96 (comment)

StefanKarpinski · 2011-07-08T13:50:52Z

What's the problem with just using the C version?

ViralBShah · 2011-07-08T13:51:39Z

Already done before you wrote. See commit 411bc0b. And for bonus, it is much faster than before. Originally, 1 million randn() took 0.06 seconds, and now it is 0.04 seconds for me.

-viral

On Jul 8, 2011, at 6:59 PM, StefanKarpinski wrote:

It's great that we could be at 2x the speed of C if Jeff did a bunch of fantastic optimization work overnight. That's not going to happen immediately, however, and in the meantime, we can be at 1x the speed of C just by using the C version. What's the point of not doing that?

Reply to this email directly or view it on GitHub:
#96 (comment)

StefanKarpinski · 2011-07-08T13:54:05Z

Excellent :-)

ViralBShah · 2011-07-08T13:54:19Z

No problem, so long as the pressure for higher performance for julia code is always in our face. Otherwise we are no different from R or Matlab, where you have to write a lot of stuff in C for performance.

-viral

On Jul 8, 2011, at 7:20 PM, StefanKarpinski wrote:

What's the problem with just using the C version?

Reply to this email directly or view it on GitHub:
#96 (comment)

StefanKarpinski · 2011-07-08T14:04:56Z

Well, I think we have to strike a balance. For a lot of code being 2x slower than C is perfectly fine. But for really core things like BLAS, FFT, and random number generation, we need to be really fast. And one of the huge strengths of the language is how simple it is to provide a correct but relatively slow general implementation of some operation as a fallback, and then plug in highly optimized special cases transparently so that the user never has to think about it. It lets us present a consistent abstraction without sacrificing performance. Every other language has to pick one or the other.

StefanKarpinski · 2011-07-08T20:36:00Z

Can we close this now? Seems like the practical effect is taken care of, which is what's important for v1.

JeffBezanson · 2011-07-08T21:39:23Z

The goal is not so much to eliminate C as to reduce the need. Yes, it might happen that a user needs to write something in C for performance. But it would be silly if the thing they needed to rewrite in C was our own random number generation library.
Remember, even C programmers call a BLAS written in assembly language.

StefanKarpinski · 2011-07-08T21:44:53Z

I'm confused by this answer. I'm not arguing that we shouldn't have performance as close to C as possible. We definitely should. However, I just don't get what's wrong with using high-performance C code that we're never going to beat like libMT, FFTW, etc. I feel like we're talking past each other here though.

ViralBShah · 2011-07-09T11:58:08Z

I guess eventually, we need to implement the Ziggurat algorithm, rather than using the Box Mueller transformation for generating normal random numbers:

http://en.wikipedia.org/wiki/Ziggurat_algorithm

Some good analysis on ziggurat performance:

http://seehuhn.de/pages/ziggurat

ViralBShah · 2011-07-09T12:01:43Z

A textbook description and implementation of the Ziggurat algorithm (and others) by Cleve Moler. Matlab also uses this algorithm for randn().

http://www.mathworks.com/moler/random.pdf
http://www.mathworks.com/moler/ncm/randntx.m

http://www.mathworks.com/company/newsletters/news_notes/clevescorner/spring01_cleve.html

ViralBShah · 2011-07-09T13:40:35Z

Immediate randn performance is addressed with the Box-Mueller implementation in jl_random.c. However, this is 4x slower than rand(), and needs to be faster for a lot of applications. I have updated the subject of this issue to reflect the implementation of Ziggurat, changed milestone to v2.0, and assigned it to myself for now.

StefanKarpinski · 2011-07-09T15:21:22Z

Why do we need to use Ziggurat over Box-Muller? Wikipedia of course has detailed descriptions of each, but does no comparison of them and why you might prefer one over the other. From Cleve's writeup, I'm gathering that Ziggurat is less expensive in that you're less likely to reject values when generating random numbers. Is that the reason?

StefanKarpinski · 2011-07-09T15:27:34Z

Ah, I see the performance analysis now. Makes sense. Seems like the best performance overall would be a high-quality Ziggurat implementation in C.

ViralBShah · 2011-07-09T17:56:17Z

Yes, Ziggurat just requires two random numbers and some cheap computation, whereas Box-Mueller requires two random numbers, and a sqrt and log, which are expensive. Also, Ziggurat rejects much fewer values.

I did a bunch of digging around - and it seems that it is widely accepted that ziggurat is the way to go.

-viral

On Jul 9, 2011, at 8:51 PM, StefanKarpinski wrote:

Why do we need to use Ziggurat over Box-Muller? Wikipedia of course has detailed descriptions of each, but does no comparison of them and why you might prefer one over the other. From Cleve's writeup, I'm gathering that Ziggurat is less expensive in that you're less likely to reject values when generating random numbers. Is that the reason?

Reply to this email directly or view it on GitHub:
#96 (comment)

ViralBShah · 2011-10-15T00:17:23Z

Implemented in commit 7ff80c9

Convert generated function -> stagedfunction for older julia 0.4

stevengj · 2020-06-09T17:47:31Z

Too bad we didn't keep Box-Muller as a fallback. The advantage of Box-Muller is that it is precision-independent, so it would be usable by user-defined types (like DecimalFloats JuliaMath/DecFP.jl#137 or DoubleFloats JuliaMath/DoubleFloats.jl@14d9131) once they implement rand().

StefanKarpinski · 2020-06-09T18:30:20Z

Could always be added back as a generic fallback. It’s there in the git history somewhere and probably not that hard to recreate.

Add exercise: triangle

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (#92)

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (#92) (cherry picked from commit 2687bbb)

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (JuliaLang#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (JuliaLang#92)

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (#92) (cherry picked from commit 2687bbb)

Support `breakpoint(filename, lineno)`

@DilumAluthge

Stdlib: Statistics URL: https://github.com/JuliaStats/Statistics.jl.git Stdlib branch: master Julia branch: master Old commit: 68869af New commit: d49c2bf Julia version: 1.12.0-DEV Statistics version: 1.11.2(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaStats/Statistics.jl@68869af...d49c2bf ``` $ git log --oneline 68869af..d49c2bf d49c2bf Merge pull request #178 from JuliaStats/dw/ci d10d6a3 Update Project.toml 1b67c17 Merge pull request #168 from JuliaStats/andreasnoack-patch-2 c3721ed Add a coverage badge 8086523 Test earliest supported Julia version and prereleases 12a1976 Update codecov in ci.yml 2caf0eb Merge pull request #177 from JuliaStats/ViralBShah-patch-1 33e6e8b Update ci.yml to use julia-actions/cache a399c19 Merge pull request #176 from JuliaStats/dependabot/github_actions/julia-actions/setup-julia-2 6b8d58a Merge branch 'master' into dependabot/github_actions/julia-actions/setup-julia-2 c2fb201 Merge pull request #175 from JuliaStats/dependabot/github_actions/actions/cache-4 8f808e4 Merge pull request #174 from JuliaStats/dependabot/github_actions/codecov/codecov-action-4 7f82133 Merge pull request #173 from JuliaStats/dependabot/github_actions/actions/checkout-4 046fb6f Update ci.yml c0fc336 Bump julia-actions/setup-julia from 1 to 2 a95a57a Bump actions/cache from 1 to 4 b675501 Bump codecov/codecov-action from 1 to 4 0088c49 Bump actions/checkout from 2 to 4 ad95c08 Create dependabot.yml 40275e2 Merge pull request #167 from JuliaStats/andreasnoack-patch-1 fa5592a Merge pull request #170 from mbauman/patch-1 cf57562 Add more tests of mean and median of ranges 128dc11 Merge pull request #169 from stevengj/patch-1 48d7a02 docfix: abs2, not ^2 2ac5bec correct std docs: sqrt is elementwise 39f6332 Merge pull request #96 from josemanuel22/mean_may_return_incorrect_results db3682b Merge branch 'master' into mean_may_return_incorrect_results 9e96507 Update src/Statistics.jl 58e5986 Test prereleases 6e76739 Implement one-argument cov2cor! b8fee00 Stop testing on nightly 9addbb8 Merge pull request #162 from caleb-allen/patch-1 6e3d223 Merge pull request #164 from aplavin/patch-1 71ebe28 Merge pull request #166 from JuliaStats/dw/cov_cor_optimization 517afa6 add tests aa0f549 Optimize `cov` and `cor` with identical arguments cc11ea9 propagate NaN value in median cf7040f Use non-mobile Wikipedia urls 547bf4d adding docu to mean! explain target should not alias with the source 296650a adding docu to mean! explain target should not alias with the source ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>

@DilumAluthge

Stdlib: Distributed URL: https://github.com/JuliaLang/Distributed.jl Stdlib branch: master Julia branch: master Old commit: 6c7cdb5 New commit: c613685 Julia version: 1.12.0-DEV Distributed version: 1.11.0(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Distributed.jl@6c7cdb5...c613685 ``` $ git log --oneline 6c7cdb5..c613685 c613685 Merge pull request #116 from JuliaLang/ci-caching 20e2ce7 Use julia-actions/cache in CI 9c5d73a Merge pull request #112 from JuliaLang/dependabot/github_actions/codecov/codecov-action-5 ed12496 Merge pull request #107 from JamesWrigley/remotechannel-empty 010828a Update .github/workflows/ci.yml 11451a8 Bump codecov/codecov-action from 4 to 5 8b5983b Merge branch 'master' into remotechannel-empty 729ba6a Fix docstring of `@everywhere` (#110) af89e6c Adding better docs to exeflags kwarg (#108) 8537424 Implement Base.isempty(::RemoteChannel) 6a0383b Add a wait(::[Abstract]WorkerPool) (#106) 1cd2677 Bump codecov/codecov-action from 1 to 4 (#96) cde4078 Bump actions/cache from 1 to 4 (#98) 6c8245a Bump julia-actions/setup-julia from 1 to 2 (#97) 1ffaac8 Bump actions/checkout from 2 to 4 (#99) 8e3f849 Fix RemoteChannel iterator interface (#100) f4aaf1b Fix markdown errors in README.md (#95) 2017da9 Merge pull request #103 from JuliaLang/sf/sigquit_instead 07389dd Use `SIGQUIT` instead of `SIGTERM` ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>

ghost assigned JeffBezanson Jul 8, 2011

ghost assigned ViralBShah Jul 9, 2011

ViralBShah closed this as completed Oct 15, 2011

ViralBShah mentioned this issue Mar 19, 2012

WIP: libuv rewrite and Windows port #521

Merged

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

Merge pull request #96 from JuliaLang/teh/staged

db40185

Convert generated function -> stagedfunction for older julia 0.4

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

fix: error when JULIA_ENV ends with .toml (#96)

d72d372

vchuravy mentioned this issue Feb 8, 2018

use git subtree to graft Pkg3, etc. into stdlib #25942

Closed

rfourquet added the randomness Random number generation and the Random stdlib label Jun 9, 2020

cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020

Merge pull request JuliaLang#96 from Teo-ShaoWei/ex-triangle

10d8669

Add exercise: triangle

fredrikekre added a commit that referenced this issue Feb 11, 2021

Update Downloads.jl:

7ecd503

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (#92)

fredrikekre added a commit that referenced this issue Feb 11, 2021

Update Downloads.jl: (#39615)

2687bbb

$ git log --pretty=oneline --abbrev=commit 0d798cf..2b4bed9 2b4bed901185edcb59389515378eba2f26aaa577 Disable HTTP/2. (#96) 88a217c50bc165a10009c7b8c617e458a1e0ad0e Fix optoin typo (#92)

stevengj mentioned this issue Mar 23, 2022

fallback randn/randexp for AbstractFloat #44714

Merged

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #96 from JuliaDebug/teh/breakpoints_fileline

bf14122

Support `breakpoint(filename, lineno)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

randn performance - implement Ziggurat algorithm #96

randn performance - implement Ziggurat algorithm #96

ViralBShah commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Oct 15, 2011

stevengj commented Jun 9, 2020 •

edited

Loading

StefanKarpinski commented Jun 9, 2020

randn performance - implement Ziggurat algorithm #96

randn performance - implement Ziggurat algorithm #96

Comments

ViralBShah commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

JeffBezanson commented Jul 8, 2011

StefanKarpinski commented Jul 8, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

ViralBShah commented Oct 15, 2011

stevengj commented Jun 9, 2020 • edited Loading

StefanKarpinski commented Jun 9, 2020

stevengj commented Jun 9, 2020 •

edited

Loading