Faster and Better weak generators #59

arthurprs · 2015-07-17T13:57:34Z

I took a couple of days to research some fast pseudo-random number generators and I thought it'd be nice to contribute to the crate.

This PR contains two commits one adding the xorshift128+ and the other PCG RXS M XS (32 and 64bit variants). I don't think there's precedence for adding both, so I think we should discuss how to proceed.

Algorithm	Size	Quality	x86 (u32)	x86 (u64)	x64 (u32)	x64 (u64)
PCG32	4 Bytes	Fair	1365	1616	1646	1677
PCG64	8 Bytes	Great	510	1016	1646	3493
Xorshift128	16 Bytes	Fair	1486	1895	2094	2279
Xorshift128+	16 Bytes	Great	911	1656	2000	4278

x64 benchmarks (my laptop)

test algorithms::chacha_u32                        ... bench:       2,477 ns/iter (+/- 191) = 161 MB/s
test algorithms::chacha_u64                        ... bench:       4,783 ns/iter (+/- 353) = 167 MB/s
test algorithms::isaac64_u32                       ... bench:         617 ns/iter (+/- 40) = 648 MB/s
test algorithms::isaac64_u64                       ... bench:         639 ns/iter (+/- 16) = 1251 MB/s
test algorithms::isaac_u32                         ... bench:         847 ns/iter (+/- 30) = 472 MB/s
test algorithms::isaac_u64                         ... bench:       1,579 ns/iter (+/- 130) = 506 MB/s
test algorithms::pcg32_u32                         ... bench:         243 ns/iter (+/- 10) = 1646 MB/s
test algorithms::pcg32_u64                         ... bench:         477 ns/iter (+/- 23) = 1677 MB/s
test algorithms::pcg64_u32                         ... bench:         243 ns/iter (+/- 14) = 1646 MB/s
test algorithms::pcg64_u64                         ... bench:         229 ns/iter (+/- 16) = 3493 MB/s
test algorithms::xorshift_u32                      ... bench:         191 ns/iter (+/- 11) = 2094 MB/s
test algorithms::xorshift_u64                      ... bench:         351 ns/iter (+/- 18) = 2279 MB/s
test algorithms::xorshiftp_u32                     ... bench:         200 ns/iter (+/- 11) = 2000 MB/s
test algorithms::xorshiftp_u64                     ... bench:         187 ns/iter (+/- 7) = 4278 MB/s

x86 benchmarks (DO)

test algorithms::chacha_u32                        ... bench:       2,644 ns/iter (+/- 21) = 151 MB/s
test algorithms::chacha_u64                        ... bench:       4,964 ns/iter (+/- 28) = 161 MB/s
test algorithms::isaac64_u32                       ... bench:       1,038 ns/iter (+/- 8) = 385 MB/s
test algorithms::isaac64_u64                       ... bench:       1,059 ns/iter (+/- 6) = 755 MB/s
test algorithms::isaac_u32                         ... bench:       1,107 ns/iter (+/- 12) = 361 MB/s
test algorithms::isaac_u64                         ... bench:       2,004 ns/iter (+/- 13) = 399 MB/s
test algorithms::pcg32_u32                         ... bench:         293 ns/iter (+/- 4) = 1365 MB/s
test algorithms::pcg32_u64                         ... bench:         495 ns/iter (+/- 6) = 1616 MB/s
test algorithms::pcg64_u32                         ... bench:         783 ns/iter (+/- 15) = 510 MB/s
test algorithms::pcg64_u64                         ... bench:         787 ns/iter (+/- 9) = 1016 MB/s
test algorithms::xorshift_u32                      ... bench:         269 ns/iter (+/- 4) = 1486 MB/s
test algorithms::xorshift_u64                      ... bench:         422 ns/iter (+/- 6) = 1895 MB/s
test algorithms::xorshiftp_u32                     ... bench:         439 ns/iter (+/- 7) = 911 MB/s
test algorithms::xorshiftp_u64                     ... bench:         483 ns/iter (+/- 5) = 1656 MB/s

PCG is neat, it's very compact and provide good quality, even in the 32bit variant!

Although my preference would be to add the xorshift128+ variant and leave the xorshift128 there (probably worth a rename though).

In addition weak_rng should return the fastest algorithm available for the architectures. As 64bit operations kills performance in 32bit architectures this can be as simple as Xorshift64 for 32bit and Xorshift128+ for 64bit.

Finally, users won't need to look elsewhere for a crazy fast random implementation.

Thoughts?

rust-highfive · 2015-07-17T13:57:49Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

nagisa · 2015-07-17T15:09:03Z

Hi!

Xorshift+ sounds like a sensible addition. As far as PCG is concerned, I’d implement it in an external crate (there’s a few on cargo already).

In addition weak_rng should return the fastest algorithm available for the platform.

The “for the platform” is a pretty hard thing to do – we don’t want to start benchmarking all the xorshifts at runtime, nor do we want to make weak_rng to return XorShift on some platforms and XorShift+ on others. Therefore we have to rely on eyeballing and pick an algorithm that seems to be fast(est) on as many platforms as possible (our current pick being XorShift).

Changing the rng returned by weak_rng, sadly, is also a somewhat breaking change (breaks code such as let rng: XorShift = weak_rng()) and a small increase of performance on some of the workloads does not sound convincing enough to make the change. What does sound, convincing, though, is that XorShift+ passes BigCrush (as per Wikipedia), while our current choice does not.

arthurprs · 2015-07-17T16:42:51Z

@nagisa by "fastest algorithm available for the platform." I meant, that 64bits platform should default to Xorshift128+ as it's better overall, both quality and speed wise.

I agree that it'll break some code (not much though) but I'd argue that it's both worthy and expected as per semver.

alexcrichton · 2015-07-17T18:27:27Z

src/pcg.rs

+    #[inline]
+    fn next_u64(&mut self) -> u64 {
+    	const MULTIPLIER: w64 = w(6364136223846793005);
+		const INCREMENT: w64  = w(1442695040888963407);


This indentation (and a good amount of indentation below) is a little off.

alexcrichton · 2015-07-17T18:31:14Z

r? @huonw, you may have more opinions on this than I

The “for the platform” is a pretty hard thing to do – we don’t want to start benchmarking all the xorshifts at runtime, nor do we want to make weak_rng to return XorShift on some platforms and XorShift+ on others. Therefore we have to rely on eyeballing and pick an algorithm that seems to be fast(est) on as many platforms as possible (our current pick being XorShift).

While I agree we don't want to determine the fastest one at runtime, I do think that we should change the return type of weak_rng to WeakRng to hide the implementation. That way we can swap it out on various platforms and also change it in the future (like we want to right now) without breaking code.

For now I'd avoid changing weak_rng as this crate will likely have a number of other breaking changes coming soon, and it probably isn't worth the breakage just yet.

huonw · 2015-07-17T18:34:41Z

leave the xorshift64 there

Erm, maybe we're using different notation, but I thought the generator implemented here was xorshift128.

arthurprs · 2015-07-17T19:19:32Z

@huonw yeah, I was refering to the algorithm name of the XorShiftRng. We should consider renaming both to XorShift64Rng and XorShift128PRng though.

huonw · 2015-07-17T19:51:33Z

Hm, everything I can remember reading calls our thing xorshift128, e.g. https://en.wikipedia.org/wiki/Xorshift , with xorshift64 referring to a different variant with 64-bits of internal state. (I'm trying to clarify to make sure we're all on the same page.)

arthurprs · 2015-07-17T20:21:24Z

Updated the PR code to

removed pcg implementation
avoid breaking changes for weak_rng (will submit separately)
tests for panicking if given all 0 seed
renamed structs
added more doc lines for structs
what we have right now is xorshift128 not xorshift64... sorry everyone

@huonw yaaa, somehow 64 sticked into my head. Thank you.

DanielKeep · 2015-07-18T17:33:16Z

@nagisa There's one by EricIO and one by codahale, which look to be minimal ports of Pcg. I just started a full port of the entire C++ reference implementation, mostly as an exercise in porting over C++ code using template mixins.

I aim to also port the crazy "extended" generators that allow for generators with arbitrarily large periods... y'know, because it's there.

Any input in relation to this crate is welcome.

sfackler · 2015-07-23T23:02:01Z

@alexcrichton With respect to "saving up" breaking changes, I've gotten into the habit of having a "breaks" branch that PRs with breaking changes merge into. I rebase it to keep it up to date with master, and then merge it in when I'm ready to push a new major release. It's nice since you can still accept breaking PRs instead of leaving them in limbo for an unknown period of time.

arthurprs · 2015-07-23T23:05:27Z

Just to make it clearer, I removed all breaking changes on this.
So what's in is the new xorshift128+ implementation, tests for xorshift variants and code/benchmarks cleanup.

We can integrate the Weak abstraction (to hide the implementation) in a subsequent pr/version.

arthurprs · 2015-08-04T19:10:54Z

Hello again, any updates on this? I believe this is a good addition as it is.

It'd be specially useful if we hid the implementation of weak_rng() in a WeakRng struct, this way we could hide the fastest implementation behind it. But this is a breaking change and we must make a decision.

Anyway I believe this should be merged to encourage further improvements down the road (like the previous mentioned WeakRng struct).

zackw · 2015-09-15T01:43:35Z

I would like to stick my oar in: Please do NOT provide any "weak" algorithms. Once they are in, they can't be got rid of because RNGs need to be reproducible even decades later; but bad randomness is an endless well of subtle bugs, often security-critical bugs.

I would hesitate even to provide ISAAC.

pcwalton · 2015-09-15T01:48:37Z

@alexcrichton I think maybe you should weigh in here?

pcwalton · 2015-09-15T01:49:06Z

For what it's worth, I don't care at all about whether we provide any weak RNGs, as long as they're marked as such, but I do care a lot that we don't encourage userspace CSPRNGs.

zackw · 2015-09-15T01:55:32Z

@pcwalton Why do you want to discourage the use of userspace CSPRNGs? I ask largely because people have been making noises about adding ChaCha-based arc4random* to glibc, and I'm not aware of any reason why that would be a bad idea. (No end of implementation headches, yes, but.)

graydon · 2015-09-15T02:27:55Z

I think the idea with avoiding userspace CSPRNGs is that you are less likely to keep them adequately (re)seeded, their state hidden from side channels, and their implementation up to date with contemporary assumptions about what constitutes "secure" as you are if you just always call the kernel's variant. I mostly agree with this sentiment and, as the person who added ISAAC in the first place, wouldn't necessarily miss it if it were removed and the default set to OsRng.

It does kinda break my heart to have a weak RNG in a stdlib at any level, certainly if it winds up turning into the default simply to push back on FUD from a bunch of lousy benchmarks. It feels like a bad trade to me, better to leave in an external crate and called WeakRNGsOnlyForInsecurePurposes; but I guess it's up to the community to decide. That's my $0.02.

alexcrichton · 2015-09-16T14:55:18Z

I agree with @pcwalton that I don't mind having these so long as they're documented as not secure at all, from time to time I've enjoyed having a "not fancy" RNG so I think there's real use in having them. That being said I'll still defer to @huonw as he's got a WIP design for a revamped rand crate.

bhickey · 2015-10-24T21:40:43Z

After chatting with some other RNG enthusiasts I'd advocate for removing non-crypto PRNGs from rand. Ideally I'd fork rand into rand-crypt and rand-sim, but I think this is unlikely to gain traction. There are very different design requirements for simulation and crypto and we may be ill served by cramming them together. For games and simulation reproducibility is very important, while security demands easily verified code. Sequestering simulation-quality PRNGs to another crate simplifies verification (ex. If you're using rand-sim, you're provably insecure) and reduces conflict between competing goals. @bstrie @robojeb

bstrie · 2015-10-25T01:28:12Z

I'm with @bhickey here. A simple implementation of XorShift is all of three lines of code, and even if you mess up writing it it's basically irrelevant because you have no expectation of quality randomness from XorShift in the first place. (Not to cast aspersions on the implementation here, which looks much more complete and featureful--but I'd still prefer it live as a third-party crate.)

I also think that having separate interfaces for simulation-worthy RNGs and crypto-worthy RNGs is worth investigating in third-party crates for now.

arthurprs · 2015-10-25T03:17:32Z

I believe there's very little benefit in splitting the crate. This way we can have the same interfaces and extras like we have today. I'm all in for keeping secure PRNGs as the default though.

DanielKeep · 2015-10-25T03:48:59Z

Couldn't this split also be achieved by having two top-level modules within the crate: secure and weak? You would probably want to remove top-level aliases (other than a "I don't care, just gimme random values" that defaults to something secure), which means any attempt to use a generator involves explicitly choosing between secure and weak generators.

bhickey · 2015-10-25T04:17:44Z

@arthurprs There's no reason the interface would need to change. If we forked the crate today Rng would go on being the common trait.

@DanielKeep rand doesn't make any promises about reproducibility. Changes to support code can and will break anyone who wants to do repeatable simulation.

sorear · 2015-10-27T03:58:58Z

The last few times I've done randomness-intensive simulations I've used AES-CTR. With hardware acceleration (laptop with AES-NI) it's more than fast enough, and the peace of mind is great (and of course it's reproducible). So "simulations" versus "cryptography" may not be nuanced enough.

robojeb · 2015-10-27T13:18:14Z

@sorear I don't know if that is really a good enough reason to say that we might not need a fast simulation focused PRNG. When optimizing a game for diverse hardware I would rather have a 10 instruction decent PRNG then hope that whatever machine one of my users is on has the acceleration needed.

Also can you confirm that it is reproducible across all hardware accelerated instances? and is that hardware implementation consistent with the fallback software instance?

I agree that the discussion is nuanced but it seems to me that some people just need a "This is going to be stupidly fast in all situations" PRNG even if that means sacrificing crypotographic security.

sorear · 2015-10-28T04:12:43Z

@robojeb To be clear, I'm not proposing to use cryptographic key-material RNGs for simulations. Those are not reproducible by design. However, stream ciphers do produce reproducible output, and are consistent between implementations (they have to be, because if the Intel AES-CTR implementation and the software reference implementation didn't produce exactly the same keystream bytes, your encrypted data wouldn't round-trip).

robojeb · 2015-10-28T04:42:16Z

@sorear that is a great point that totally slipped my mind.

I think my main issue in this discussion is discoverability. Don't get me wrong I love the fact that cryptographically good number generators are easy to access. That is great for the rust community and they should be used when needed.

I think though that it is equally usefull to the community to promote fast PRNG with adequate documentation to teach people when they are the right option. Now I don't know if that means putting them in rand or some offshoot or "officially branded and recognized rand sister crates (like my crate is pcg_rand). But it think however we do it, it is important for them to be recognized officially if with warning about potential dragons.

robojeb · 2015-10-28T04:48:53Z

Another thought I just had. Could good generators be tagged with a marker trait like RandSecure that way cryptographic applications can add that as a trait bound. This way we can have all the RNGs of all types but users can't mess up in many critical areas.

sorear · 2015-10-28T05:29:36Z

I was going to argue that a SIMD ChaCha20 implementation would be much faster than our current implementation, and write SIMD versions of ChaCha20 and AES (AESNI) to include in your benchmarking tables above. However, it turns out that I'm way out of my depth for writing SIMD code in Rust right now, so I'll just offer http://bench.cr.yp.to/results-stream.html instead. I regard "weak reproducible RNGs" as a very niche application because the strong reproducible RNGs (stream ciphers) well implemented are so fast on recent mid-range hardware, but I don't have anything else to back that up with so I'll stop now.

graydon · 2015-10-28T13:44:14Z

Just stopping by to remind readers of the points made above re: not having a CSPRNG in userspace. It's not terribly important -- cryptographically -- if something's fast if it's inadequately seeded or uses an algorithm that gets broken. The OS should provide the secure variant (linux even has a faster syscall for it)

(And the insecure variant, if it exists at all, should be marked as such very clearly)

sneves · 2015-11-05T15:58:12Z

@graydon It's not clear to me what exactly you are arguing for or against. It seems to me you are not arguing against userspace stream ciphers (I don't like the term CSPRNG here, since it kind of implies forward and backward security are a goal), but instead you are arguing against the overall API.

That is, you are arguing against any generator being able to be seeded. Instead, that trait should be gone and essentially only OsRng should be used. I cannot agree with this, since reproducible generators are a valid (and common) use case. Nobody is arguing that the ISAAC or ChaCha generators should be used to generate keys; in fact, as the moment they are default-seeded by OsRng, so they really are no better than the latter for this task.

But OsRng is entirely unsuitable for the use cases where one wants reproducible behavior (e.g., Monte Carlo simulations, some computer algebra systems give you a seed on startup for reproducible bug reports, ...). The question then becomes one of choice: do we want (say) Xorshift generators for this case, or a cryptographically strong stream generator? The former pass, by design, some but not all statistical tests. The latter generators conjecturally pass every efficiently-computable statistical test, and they're unpredictable as a bonus. If anything, I'd say the latter should be the ones that stay for fast and reproducible generation---but maybe the API could be organized in a clearer way to highlight the separate use cases of OsRng and SeedableRng.

graydon · 2015-11-11T21:36:57Z

I'm arguing that since chacha is slower than pcg and xor, simulation-focused users won't want it. And that it's unwise to leave it lying around in case users who want cryptographic randomness pick it up in favour of osrng. I realize there are two classes of users here with different needs. I'm arguing it's best to address them separately.

sorear · 2015-11-11T21:45:43Z

@graydon @bhickey moved this thread to private email a few days ago

sneves · 2015-11-11T21:49:32Z

And I am arguing that simulation-focused users are best served with a generator that is guaranteed to actually be statistically indistinguishable from random. Maybe that's not ChaCha20; maybe it's AES-CTR, maybe it's ChaCha8, maybe it is something else (note that the performance of ChaCha on x86 is lacking, due to the absence of SIMD capabilities on Rust at the time of implementation). But regular generators, such as LCGs, Xorshift, and so on have repeatedly shown to be far from random, despite happening to pass some static (and rarely growing) set of standard statistical tests. I believe the distinction between use cases is best done through API, not primitive names.

graydon · 2015-11-11T22:19:31Z

Be careful to differentiate LCGs from PCG. The latter have good statistical properties and are fast.

I ... guess the topic is in "private email" now so unclear whether there's any use writing here.

arthurprs · 2015-12-17T15:16:53Z

Just an update related to the PR algorithm choice.
Every major Javascript engine implemention is now using xorshift128+ to implement Math.rand

See http://v8project.blogspot.de/2015/12/theres-mathrandom-and-then-theres.html?m=1

bstrie · 2015-12-18T05:19:36Z

Javascript engines (specifically wrt Math.rand) are in a strange position where they care neither about cryptographic security nor about reproducibility. I would expect a blessed rust-lang crate to pick one of those two concerns and do it well. Generators that provide low-quality randomness can live on crates.io (or you can copy/paste a trivial three-line xorshift implementation into your code).

arthurprs · 2015-12-18T10:25:53Z

What I meant is that xorshift128+ is a great choice for the weak generator.

I don't really get why so much resistance against this. 99,9% of time you just want a good distribution. And the default generator already is the secure one.

bstrie · 2015-12-18T20:07:01Z

Ah, I thought you were implying that we should so like JS engines do and make xorshift+ the default generator.

In any case, I don't think that movement on behalf of JS engines addresses the prior objections to this PR wrt whether or not weak generators should have blessed implementations at all. In browser-land, a pure JS PRNG implementation is leaving a lot of performance on the table, which is why a native implementation is desirable. Rust doesn't have the same restrictions, and Cargo makes it trivial to pull in a faster implementation when necessary.

I think we're all running around in circles wrt to this entire library at this point. Two people in here have already suggested that they're working on designs for total overhauls, so I'm not sure how we want to proceed (especially considering that I haven't seen public demonstrations of either of these initiatives yet).

alexcrichton · 2016-05-24T17:20:29Z

Sorry for the delay @arthurprs, but I wanted to say thanks again for the PR! The libs team discussed this PR during triage yesterday, and the conclusion was that for now we don't want to include any other RNG implementations in the rand crate directly. It's possible to ergonomically build this out of tree (due to the traits in rand) so there's not necessarily an inherent reason to include this in the rand crate itself.

Unfortunately this crate is lacking a vision of how to proceed forward which makes it difficult to say whether we want this sort of RNG eventually. We hope to start devoting some time to this crate soon, but help is always appreciated!

In the meantime I'm gonna close this, but please feel free to publish this on crates.io!

arthurprs · 2016-05-25T08:23:03Z

Thanks Alex, I'll move the code to a crate.

rust-highfive assigned alexcrichton Jul 17, 2015

alexcrichton reviewed Jul 17, 2015
View reviewed changes

rust-highfive assigned huonw and unassigned alexcrichton Jul 17, 2015

arthurprs force-pushed the master branch 3 times, most recently from 8a8cdcc to 42d60b9 Compare July 17, 2015 20:14

arthurprs force-pushed the master branch from 42d60b9 to 9c73909 Compare July 17, 2015 20:42

implement xorshift128+

7b530c8

arthurprs force-pushed the master branch from 9c73909 to 7b530c8 Compare July 18, 2015 13:51

nagisa mentioned this pull request Aug 12, 2015

Tracking issue for stabilizing randomness rust-lang/rust#27703

Closed

arthurprs unassigned huonw Sep 21, 2015

ranma42 mentioned this pull request Feb 2, 2016

Seed HashMaps thread-locally, straight from the OS. rust-lang/rust#31356

Closed

alexcrichton closed this May 24, 2016

Faster and Better weak generators #59

Faster and Better weak generators #59

Conversation

arthurprs commented Jul 17, 2015

rust-highfive commented Jul 17, 2015

nagisa commented Jul 17, 2015

arthurprs commented Jul 17, 2015

alexcrichton Jul 17, 2015

Choose a reason for hiding this comment

alexcrichton commented Jul 17, 2015

huonw commented Jul 17, 2015

arthurprs commented Jul 17, 2015

huonw commented Jul 17, 2015

arthurprs commented Jul 17, 2015

DanielKeep commented Jul 18, 2015

sfackler commented Jul 23, 2015

arthurprs commented Jul 23, 2015

arthurprs commented Aug 4, 2015

zackw commented Sep 15, 2015

pcwalton commented Sep 15, 2015

pcwalton commented Sep 15, 2015

zackw commented Sep 15, 2015

graydon commented Sep 15, 2015

alexcrichton commented Sep 16, 2015

bhickey commented Oct 24, 2015

bstrie commented Oct 25, 2015

arthurprs commented Oct 25, 2015

DanielKeep commented Oct 25, 2015

bhickey commented Oct 25, 2015

sorear commented Oct 27, 2015

robojeb commented Oct 27, 2015

sorear commented Oct 28, 2015

robojeb commented Oct 28, 2015

robojeb commented Oct 28, 2015

sorear commented Oct 28, 2015

graydon commented Oct 28, 2015

sneves commented Nov 5, 2015

graydon commented Nov 11, 2015

sorear commented Nov 11, 2015

sneves commented Nov 11, 2015

graydon commented Nov 11, 2015

arthurprs commented Dec 17, 2015

bstrie commented Dec 18, 2015

arthurprs commented Dec 18, 2015

bstrie commented Dec 18, 2015

alexcrichton commented May 24, 2016

arthurprs commented May 25, 2016