Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster and Better weak generators #59

Closed
wants to merge 1 commit into from

Conversation

arthurprs
Copy link
Contributor

I took a couple of days to research some fast pseudo-random number generators and I thought it'd be nice to contribute to the crate.

This PR contains two commits one adding the xorshift128+ and the other PCG RXS M XS (32 and 64bit variants). I don't think there's precedence for adding both, so I think we should discuss how to proceed.

Algorithm Size Quality x86 (u32) x86 (u64) x64 (u32) x64 (u64)
PCG32 4 Bytes Fair 1365 1616 1646 1677
PCG64 8 Bytes Great 510 1016 1646 3493
Xorshift128 16 Bytes Fair 1486 1895 2094 2279
Xorshift128+ 16 Bytes Great 911 1656 2000 4278

x64 benchmarks (my laptop)

test algorithms::chacha_u32                        ... bench:       2,477 ns/iter (+/- 191) = 161 MB/s
test algorithms::chacha_u64                        ... bench:       4,783 ns/iter (+/- 353) = 167 MB/s
test algorithms::isaac64_u32                       ... bench:         617 ns/iter (+/- 40) = 648 MB/s
test algorithms::isaac64_u64                       ... bench:         639 ns/iter (+/- 16) = 1251 MB/s
test algorithms::isaac_u32                         ... bench:         847 ns/iter (+/- 30) = 472 MB/s
test algorithms::isaac_u64                         ... bench:       1,579 ns/iter (+/- 130) = 506 MB/s
test algorithms::pcg32_u32                         ... bench:         243 ns/iter (+/- 10) = 1646 MB/s
test algorithms::pcg32_u64                         ... bench:         477 ns/iter (+/- 23) = 1677 MB/s
test algorithms::pcg64_u32                         ... bench:         243 ns/iter (+/- 14) = 1646 MB/s
test algorithms::pcg64_u64                         ... bench:         229 ns/iter (+/- 16) = 3493 MB/s
test algorithms::xorshift_u32                      ... bench:         191 ns/iter (+/- 11) = 2094 MB/s
test algorithms::xorshift_u64                      ... bench:         351 ns/iter (+/- 18) = 2279 MB/s
test algorithms::xorshiftp_u32                     ... bench:         200 ns/iter (+/- 11) = 2000 MB/s
test algorithms::xorshiftp_u64                     ... bench:         187 ns/iter (+/- 7) = 4278 MB/s

x86 benchmarks (DO)

test algorithms::chacha_u32                        ... bench:       2,644 ns/iter (+/- 21) = 151 MB/s
test algorithms::chacha_u64                        ... bench:       4,964 ns/iter (+/- 28) = 161 MB/s
test algorithms::isaac64_u32                       ... bench:       1,038 ns/iter (+/- 8) = 385 MB/s
test algorithms::isaac64_u64                       ... bench:       1,059 ns/iter (+/- 6) = 755 MB/s
test algorithms::isaac_u32                         ... bench:       1,107 ns/iter (+/- 12) = 361 MB/s
test algorithms::isaac_u64                         ... bench:       2,004 ns/iter (+/- 13) = 399 MB/s
test algorithms::pcg32_u32                         ... bench:         293 ns/iter (+/- 4) = 1365 MB/s
test algorithms::pcg32_u64                         ... bench:         495 ns/iter (+/- 6) = 1616 MB/s
test algorithms::pcg64_u32                         ... bench:         783 ns/iter (+/- 15) = 510 MB/s
test algorithms::pcg64_u64                         ... bench:         787 ns/iter (+/- 9) = 1016 MB/s
test algorithms::xorshift_u32                      ... bench:         269 ns/iter (+/- 4) = 1486 MB/s
test algorithms::xorshift_u64                      ... bench:         422 ns/iter (+/- 6) = 1895 MB/s
test algorithms::xorshiftp_u32                     ... bench:         439 ns/iter (+/- 7) = 911 MB/s
test algorithms::xorshiftp_u64                     ... bench:         483 ns/iter (+/- 5) = 1656 MB/s

PCG is neat, it's very compact and provide good quality, even in the 32bit variant!

Although my preference would be to add the xorshift128+ variant and leave the xorshift128 there (probably worth a rename though).

In addition weak_rng should return the fastest algorithm available for the architectures. As 64bit operations kills performance in 32bit architectures this can be as simple as Xorshift64 for 32bit and Xorshift128+ for 64bit.

Finally, users won't need to look elsewhere for a crazy fast random implementation.

Thoughts?

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@nagisa
Copy link
Contributor

nagisa commented Jul 17, 2015

Hi!

Xorshift+ sounds like a sensible addition. As far as PCG is concerned, I’d implement it in an external crate (there’s a few on cargo already).

In addition weak_rng should return the fastest algorithm available for the platform.

The “for the platform” is a pretty hard thing to do ­– we don’t want to start benchmarking all the xorshifts at runtime, nor do we want to make weak_rng to return XorShift on some platforms and XorShift+ on others. Therefore we have to rely on eyeballing and pick an algorithm that seems to be fast(est) on as many platforms as possible (our current pick being XorShift).

Changing the rng returned by weak_rng, sadly, is also a somewhat breaking change (breaks code such as let rng: XorShift = weak_rng()) and a small increase of performance on some of the workloads does not sound convincing enough to make the change. What does sound, convincing, though, is that XorShift+ passes BigCrush (as per Wikipedia), while our current choice does not.

@arthurprs
Copy link
Contributor Author

@nagisa by "fastest algorithm available for the platform." I meant, that 64bits platform should default to Xorshift128+ as it's better overall, both quality and speed wise.

I agree that it'll break some code (not much though) but I'd argue that it's both worthy and expected as per semver.

#[inline]
fn next_u64(&mut self) -> u64 {
const MULTIPLIER: w64 = w(6364136223846793005);
const INCREMENT: w64 = w(1442695040888963407);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation (and a good amount of indentation below) is a little off.

@alexcrichton
Copy link
Contributor

r? @huonw, you may have more opinions on this than I

The “for the platform” is a pretty hard thing to do ­– we don’t want to start benchmarking all the xorshifts at runtime, nor do we want to make weak_rng to return XorShift on some platforms and XorShift+ on others. Therefore we have to rely on eyeballing and pick an algorithm that seems to be fast(est) on as many platforms as possible (our current pick being XorShift).

While I agree we don't want to determine the fastest one at runtime, I do think that we should change the return type of weak_rng to WeakRng to hide the implementation. That way we can swap it out on various platforms and also change it in the future (like we want to right now) without breaking code.

For now I'd avoid changing weak_rng as this crate will likely have a number of other breaking changes coming soon, and it probably isn't worth the breakage just yet.

@rust-highfive rust-highfive assigned huonw and unassigned alexcrichton Jul 17, 2015
@huonw
Copy link
Contributor

huonw commented Jul 17, 2015

leave the xorshift64 there

Erm, maybe we're using different notation, but I thought the generator implemented here was xorshift128.

@arthurprs
Copy link
Contributor Author

@huonw yeah, I was refering to the algorithm name of the XorShiftRng. We should consider renaming both to XorShift64Rng and XorShift128PRng though.

@huonw
Copy link
Contributor

huonw commented Jul 17, 2015

Hm, everything I can remember reading calls our thing xorshift128, e.g. https://en.wikipedia.org/wiki/Xorshift , with xorshift64 referring to a different variant with 64-bits of internal state. (I'm trying to clarify to make sure we're all on the same page.)

@arthurprs arthurprs force-pushed the master branch 3 times, most recently from 8a8cdcc to 42d60b9 Compare July 17, 2015 20:14
@arthurprs
Copy link
Contributor Author

Updated the PR code to

  • removed pcg implementation
  • avoid breaking changes for weak_rng (will submit separately)
  • tests for panicking if given all 0 seed
  • renamed structs
  • added more doc lines for structs
  • what we have right now is xorshift128 not xorshift64... sorry everyone

@huonw yaaa, somehow 64 sticked into my head. Thank you.

@DanielKeep
Copy link

@nagisa There's one by EricIO and one by codahale, which look to be minimal ports of Pcg. I just started a full port of the entire C++ reference implementation, mostly as an exercise in porting over C++ code using template mixins.

I aim to also port the crazy "extended" generators that allow for generators with arbitrarily large periods... y'know, because it's there.

Any input in relation to this crate is welcome.

@sfackler
Copy link
Contributor

@alexcrichton With respect to "saving up" breaking changes, I've gotten into the habit of having a "breaks" branch that PRs with breaking changes merge into. I rebase it to keep it up to date with master, and then merge it in when I'm ready to push a new major release. It's nice since you can still accept breaking PRs instead of leaving them in limbo for an unknown period of time.

@arthurprs
Copy link
Contributor Author

Just to make it clearer, I removed all breaking changes on this.
So what's in is the new xorshift128+ implementation, tests for xorshift variants and code/benchmarks cleanup.

We can integrate the Weak abstraction (to hide the implementation) in a subsequent pr/version.

@arthurprs
Copy link
Contributor Author

Hello again, any updates on this? I believe this is a good addition as it is.

It'd be specially useful if we hid the implementation of weak_rng() in a WeakRng struct, this way we could hide the fastest implementation behind it. But this is a breaking change and we must make a decision.

Anyway I believe this should be merged to encourage further improvements down the road (like the previous mentioned WeakRng struct).

@zackw
Copy link

zackw commented Sep 15, 2015

I would like to stick my oar in: Please do NOT provide any "weak" algorithms. Once they are in, they can't be got rid of because RNGs need to be reproducible even decades later; but bad randomness is an endless well of subtle bugs, often security-critical bugs.

I would hesitate even to provide ISAAC.

@pcwalton
Copy link
Contributor

@alexcrichton I think maybe you should weigh in here?

@pcwalton
Copy link
Contributor

For what it's worth, I don't care at all about whether we provide any weak RNGs, as long as they're marked as such, but I do care a lot that we don't encourage userspace CSPRNGs.

@zackw
Copy link

zackw commented Sep 15, 2015

@pcwalton Why do you want to discourage the use of userspace CSPRNGs? I ask largely because people have been making noises about adding ChaCha-based arc4random* to glibc, and I'm not aware of any reason why that would be a bad idea. (No end of implementation headches, yes, but.)

@graydon
Copy link
Contributor

graydon commented Sep 15, 2015

I think the idea with avoiding userspace CSPRNGs is that you are less likely to keep them adequately (re)seeded, their state hidden from side channels, and their implementation up to date with contemporary assumptions about what constitutes "secure" as you are if you just always call the kernel's variant. I mostly agree with this sentiment and, as the person who added ISAAC in the first place, wouldn't necessarily miss it if it were removed and the default set to OsRng.

It does kinda break my heart to have a weak RNG in a stdlib at any level, certainly if it winds up turning into the default simply to push back on FUD from a bunch of lousy benchmarks. It feels like a bad trade to me, better to leave in an external crate and called WeakRNGsOnlyForInsecurePurposes; but I guess it's up to the community to decide. That's my $0.02.

@alexcrichton
Copy link
Contributor

I agree with @pcwalton that I don't mind having these so long as they're documented as not secure at all, from time to time I've enjoyed having a "not fancy" RNG so I think there's real use in having them. That being said I'll still defer to @huonw as he's got a WIP design for a revamped rand crate.

@bhickey
Copy link

bhickey commented Oct 24, 2015

After chatting with some other RNG enthusiasts I'd advocate for removing non-crypto PRNGs from rand. Ideally I'd fork rand into rand-crypt and rand-sim, but I think this is unlikely to gain traction. There are very different design requirements for simulation and crypto and we may be ill served by cramming them together. For games and simulation reproducibility is very important, while security demands easily verified code. Sequestering simulation-quality PRNGs to another crate simplifies verification (ex. If you're using rand-sim, you're provably insecure) and reduces conflict between competing goals. @bstrie @robojeb

@bstrie
Copy link
Contributor

bstrie commented Oct 25, 2015

I'm with @bhickey here. A simple implementation of XorShift is all of three lines of code, and even if you mess up writing it it's basically irrelevant because you have no expectation of quality randomness from XorShift in the first place. (Not to cast aspersions on the implementation here, which looks much more complete and featureful--but I'd still prefer it live as a third-party crate.)

I also think that having separate interfaces for simulation-worthy RNGs and crypto-worthy RNGs is worth investigating in third-party crates for now.

@arthurprs
Copy link
Contributor Author

I believe there's very little benefit in splitting the crate. This way we can have the same interfaces and extras like we have today. I'm all in for keeping secure PRNGs as the default though.

@DanielKeep
Copy link

Couldn't this split also be achieved by having two top-level modules within the crate: secure and weak? You would probably want to remove top-level aliases (other than a "I don't care, just gimme random values" that defaults to something secure), which means any attempt to use a generator involves explicitly choosing between secure and weak generators.

@bhickey
Copy link

bhickey commented Oct 25, 2015

@arthurprs There's no reason the interface would need to change. If we forked the crate today Rng would go on being the common trait.

@DanielKeep rand doesn't make any promises about reproducibility. Changes to support code can and will break anyone who wants to do repeatable simulation.

@sorear
Copy link

sorear commented Oct 27, 2015

The last few times I've done randomness-intensive simulations I've used AES-CTR. With hardware acceleration (laptop with AES-NI) it's more than fast enough, and the peace of mind is great (and of course it's reproducible). So "simulations" versus "cryptography" may not be nuanced enough.

@robojeb
Copy link

robojeb commented Oct 27, 2015

@sorear I don't know if that is really a good enough reason to say that we might not need a fast simulation focused PRNG. When optimizing a game for diverse hardware I would rather have a 10 instruction decent PRNG then hope that whatever machine one of my users is on has the acceleration needed.

Also can you confirm that it is reproducible across all hardware accelerated instances? and is that hardware implementation consistent with the fallback software instance?

I agree that the discussion is nuanced but it seems to me that some people just need a "This is going to be stupidly fast in all situations" PRNG even if that means sacrificing crypotographic security.

@sorear
Copy link

sorear commented Oct 28, 2015

@robojeb To be clear, I'm not proposing to use cryptographic key-material RNGs for simulations. Those are not reproducible by design. However, stream ciphers do produce reproducible output, and are consistent between implementations (they have to be, because if the Intel AES-CTR implementation and the software reference implementation didn't produce exactly the same keystream bytes, your encrypted data wouldn't round-trip).

@robojeb
Copy link

robojeb commented Oct 28, 2015

@sorear that is a great point that totally slipped my mind.

I think my main issue in this discussion is discoverability. Don't get me wrong I love the fact that cryptographically good number generators are easy to access. That is great for the rust community and they should be used when needed.

I think though that it is equally usefull to the community to promote fast PRNG with adequate documentation to teach people when they are the right option. Now I don't know if that means putting them in rand or some offshoot or "officially branded and recognized rand sister crates (like my crate is pcg_rand). But it think however we do it, it is important for them to be recognized officially if with warning about potential dragons.

@robojeb
Copy link

robojeb commented Oct 28, 2015

Another thought I just had. Could good generators be tagged with a marker trait like RandSecure that way cryptographic applications can add that as a trait bound. This way we can have all the RNGs of all types but users can't mess up in many critical areas.

@sorear
Copy link

sorear commented Oct 28, 2015

I was going to argue that a SIMD ChaCha20 implementation would be much faster than our current implementation, and write SIMD versions of ChaCha20 and AES (AESNI) to include in your benchmarking tables above. However, it turns out that I'm way out of my depth for writing SIMD code in Rust right now, so I'll just offer http://bench.cr.yp.to/results-stream.html instead. I regard "weak reproducible RNGs" as a very niche application because the strong reproducible RNGs (stream ciphers) well implemented are so fast on recent mid-range hardware, but I don't have anything else to back that up with so I'll stop now.

@graydon
Copy link
Contributor

graydon commented Oct 28, 2015

Just stopping by to remind readers of the points made above re: not having a CSPRNG in userspace. It's not terribly important -- cryptographically -- if something's fast if it's inadequately seeded or uses an algorithm that gets broken. The OS should provide the secure variant (linux even has a faster syscall for it)

(And the insecure variant, if it exists at all, should be marked as such very clearly)

@sneves
Copy link
Contributor

sneves commented Nov 5, 2015

@graydon It's not clear to me what exactly you are arguing for or against. It seems to me you are not arguing against userspace stream ciphers (I don't like the term CSPRNG here, since it kind of implies forward and backward security are a goal), but instead you are arguing against the overall API.

That is, you are arguing against any generator being able to be seeded. Instead, that trait should be gone and essentially only OsRng should be used. I cannot agree with this, since reproducible generators are a valid (and common) use case. Nobody is arguing that the ISAAC or ChaCha generators should be used to generate keys; in fact, as the moment they are default-seeded by OsRng, so they really are no better than the latter for this task.

But OsRng is entirely unsuitable for the use cases where one wants reproducible behavior (e.g., Monte Carlo simulations, some computer algebra systems give you a seed on startup for reproducible bug reports, ...). The question then becomes one of choice: do we want (say) Xorshift generators for this case, or a cryptographically strong stream generator? The former pass, by design, some but not all statistical tests. The latter generators conjecturally pass every efficiently-computable statistical test, and they're unpredictable as a bonus. If anything, I'd say the latter should be the ones that stay for fast and reproducible generation---but maybe the API could be organized in a clearer way to highlight the separate use cases of OsRng and SeedableRng.

@graydon
Copy link
Contributor

graydon commented Nov 11, 2015

I'm arguing that since chacha is slower than pcg and xor, simulation-focused users won't want it. And that it's unwise to leave it lying around in case users who want cryptographic randomness pick it up in favour of osrng. I realize there are two classes of users here with different needs. I'm arguing it's best to address them separately.

@sorear
Copy link

sorear commented Nov 11, 2015

@graydon @bhickey moved this thread to private email a few days ago

@sneves
Copy link
Contributor

sneves commented Nov 11, 2015

And I am arguing that simulation-focused users are best served with a generator that is guaranteed to actually be statistically indistinguishable from random. Maybe that's not ChaCha20; maybe it's AES-CTR, maybe it's ChaCha8, maybe it is something else (note that the performance of ChaCha on x86 is lacking, due to the absence of SIMD capabilities on Rust at the time of implementation). But regular generators, such as LCGs, Xorshift, and so on have repeatedly shown to be far from random, despite happening to pass some static (and rarely growing) set of standard statistical tests. I believe the distinction between use cases is best done through API, not primitive names.

@graydon
Copy link
Contributor

graydon commented Nov 11, 2015

Be careful to differentiate LCGs from PCG. The latter have good statistical properties and are fast.

I ... guess the topic is in "private email" now so unclear whether there's any use writing here.

@arthurprs
Copy link
Contributor Author

Just an update related to the PR algorithm choice.
Every major Javascript engine implemention is now using xorshift128+ to implement Math.rand

See http://v8project.blogspot.de/2015/12/theres-mathrandom-and-then-theres.html?m=1

@bstrie
Copy link
Contributor

bstrie commented Dec 18, 2015

Javascript engines (specifically wrt Math.rand) are in a strange position where they care neither about cryptographic security nor about reproducibility. I would expect a blessed rust-lang crate to pick one of those two concerns and do it well. Generators that provide low-quality randomness can live on crates.io (or you can copy/paste a trivial three-line xorshift implementation into your code).

@arthurprs
Copy link
Contributor Author

What I meant is that xorshift128+ is a great choice for the weak generator.

I don't really get why so much resistance against this. 99,9% of time you just want a good distribution. And the default generator already is the secure one.

@bstrie
Copy link
Contributor

bstrie commented Dec 18, 2015

Ah, I thought you were implying that we should so like JS engines do and make xorshift+ the default generator.

In any case, I don't think that movement on behalf of JS engines addresses the prior objections to this PR wrt whether or not weak generators should have blessed implementations at all. In browser-land, a pure JS PRNG implementation is leaving a lot of performance on the table, which is why a native implementation is desirable. Rust doesn't have the same restrictions, and Cargo makes it trivial to pull in a faster implementation when necessary.

I think we're all running around in circles wrt to this entire library at this point. Two people in here have already suggested that they're working on designs for total overhauls, so I'm not sure how we want to proceed (especially considering that I haven't seen public demonstrations of either of these initiatives yet).

@alexcrichton
Copy link
Contributor

Sorry for the delay @arthurprs, but I wanted to say thanks again for the PR! The libs team discussed this PR during triage yesterday, and the conclusion was that for now we don't want to include any other RNG implementations in the rand crate directly. It's possible to ergonomically build this out of tree (due to the traits in rand) so there's not necessarily an inherent reason to include this in the rand crate itself.

Unfortunately this crate is lacking a vision of how to proceed forward which makes it difficult to say whether we want this sort of RNG eventually. We hope to start devoting some time to this crate soon, but help is always appreciated!

In the meantime I'm gonna close this, but please feel free to publish this on crates.io!

@arthurprs
Copy link
Contributor Author

Thanks Alex, I'll move the code to a crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.