-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future of thread_rng
#463
Comments
Using feature flags can affect dependencies, right? This might break assumptions of dependencies, so this seems dangerous. I would prefer to use different types instead and require the dependencies to be generic in the type of thread RNG. Instead of providing several variants of I think |
Yes, feature flags can affect dependencies, which is why I suggested what I did above. Using a custom version of
This is why I would not recommend using such a feature flag on servers. Interesting idea trying to use such an attack in multiplayer games, but there is no benefit for synchronous-step models (common in RTS) or attacking the server in server-centric models (common for FPS), and even if it were achievable vs other players the consequences are not high. Going back to my proposal, I prefer
|
I tried to argue that dependencies should make it possible for the user to choose the RNG, which wouldn't have this problem. If the thread RNG is not exposed via the API, it is an implementation detail and I think it should not be affected by feature flags, because it can have unintended consequences. |
Related to this, there are security trade-offs to decide:
Given independent |
@vks I don't get what you mean; you're saying use |
I agree that having a Based on that I'd be tempted to provide very few security guarantees for I don't know enough about CSPRNG to have opinions about what guarantees that On the topic of feature flags: Having feature flags which changes what guarantees A better approach would be to encourage libraries which use rand to include feature flags which makes them use different |
While I partly agree with you @sicking it is important to consider uses like in I only see two solutions for this:
|
A few comments on the first post, but I haven't though everything through yet...
I am really not that concerned with the memory usage. While 4kb is a lot, it is also really not that much for any system that the standard library is available on. The init time is something to worry about a bit more. On my PC it takes 5265 ns to initialize For the situation with many worker threads, isn't it better to use the scheme I ended up with of splitting an RNG using a wrapper such as Performance with
Performance with
Performance with
Retrieving the RNG from TLS greatly dominates the cost here.
Yes, that is a big argument. It is the call site that determince whether On the other hand, we already reserve the right to change the algorithm of
Renaming Still, having a thread-local variant using a fast RNG does not offer much advantage in the common case unless the RNG is cached.
The performance of RDRAND, for comparison (1 value per iteration, instead of 1000 in the previous benchmarks):
RDRAND is 5-10× slower that the current
Would you really recommend |
But what is the As I said in the first post, ultimately such usage is about trust and risk, but the |
I feel like there are two very common use cases which I think would be great to have very easy-to-access APIs for:
I was thinking 1 was Maybe what I'm asking for for 2 is more dhardy#60 plus an empty And to be clear, I think there are many uses of rng which does not fall into either of these categories. For these I think having APIs which are explicit about which RNG algorithm is used and what the source of seed is the way to go. That way developers can choose whatever algorithm provides the tradeoffs that match their requirements. |
I don't know if However the point is that if 2 is not fine for Though in reality I suspect that specifically |
Category 2 (simply fast) is actually pretty easy. Our current But when it comes to seeding a hash algorithm for |
I don't think this is well-defined. You can usually make RNGs faster by increasing the size of the state, because it allows for more instruction parallelism. Also, in which regard should it be fast? Initialization? Generating |
I think the dependencies should use use |
This works for generating random bytes faster, but not necessarily for getting random data into other code faster. As noted in the doc, it's necessary to benchmark your code to find the fastest RNG, so it's pointless trying to find "the fastest RNG" for |
A question: In the latest release (rand 0.5.0), |
It requires |
There are not many RNGs for which this is true, but does help a bit for Xorshift and Xoroshiro. It does come at the cost of using more registers, so it may work better in benchmarks than in real code. |
I was also thinking of vectorization.
…On Wed, May 23, 2018, 19:18 Paul Dicker ***@***.***> wrote:
You can usually make RNGs faster by increasing the size of the state,
because it allows for more instruction parallelism.
There are not many RNGs for which this is true, but does help a bit for
Xorshift and Xoroshiro. It does come at the cost of using more registers,
so it may work better in benchmarks than in real code.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtFDkwR2-phZ6M5ORgekLWZRZ_ackks5t1ZnzgaJpZM4UCvJv>
.
|
Is this really about the overhead of TLS or about checking that the RNG was initialised? Because any call to @pitdicker your More important though is that any approach using splitting won't work with the current API, since new threads would need a "master" to initialise from. I don't know if
This is partially intentional to force users to choose between
Not in its current form, no. With extra protections (also using RDRAND or with forward secrecy), perhaps. But I don't think either of these can be added without a big performance decrease, so if we did add something like this it would be distinct from
I don't think this has any extra requirements here. @pitdicker has been experimenting with using After reviewing this again I only really see these options:
Any further thoughts? I like the idea of the latter but it does make Rand more complex. |
The details of how thread-local storage works are still a bit fuzzy for me, and the implementation in the standard library is spread over quite a few files. Checking if it is initialized is one part. There may also be some indirection, because it has to cross a crate boundary. And at some point (on Unix) it uses
I ended up with a better scheme using splitting. But I only mentioned it in the context of many worker threads using a fast RNG.
The main problem is that some sort of I am afraid it can be confusing/disappointing for users. If you don't know how things work, the current But to use When you get to that point, isn't it not just as easy to seed a small PRNG of your own choosing, possibly from Not sure what I want to say really 😄. I'm just not sure if we can do something meaningful to offer a fast PRNG in combination with TLS.
I am not against this, but also don't really see the problem of using more memory. Do you really think there are situations where we have TLS, but don't have abundant memory?
We are really not in such a bad state in my opinion (although the |
Having a single, strong, fast Is there any reason not to make the choice of RNG backing the |
I would definitely recommend having a hardware accelerated AES-based RNG on platforms with hardware AES, gated on runtime feature detection. These can be implemented using a simple abstraction: the AES encryption round function alone e.g. There are any number of options for a cipher-based CSPRNG to choose from which are all fully parallelizable/vectorizable. A general theme among these is AES-CTR with a periodic rekeying mechanism (see also RDRAND). Here's a specific, recent example of such an RNG: https://blog.cr.yp.to/20170723-random.html Regarding RDRAND specifically, I think it's perfectly fine for the case of e.g. seeding SipHash to mitigate hashDoS, but probably not the best option for some sort of general-purpose I would definitely recommend ChaCha as a pure software option which should work everywhere, with ChaCha20 as a paranoid default or it could arguably be reduced to ChaCha12 (the best known attack on ChaCha only works up to 7 rounds and 20-rounds are a paranoid safety margin). ChaCha can be trivially accelerated using e.g. AVX2-style instructions or other vector unit primitives which are ubiquitously available (add, rotate, xor) and has a small code size. I'm certainly not wild about things like HC-128 or ISAAC. The former saw some analysis via ESTREAM, but ChaCha20 (and its predecessor Salsa20) have seen considerably more as ChaCha20 is an MTI cipher for TLS. It definitely sounds like there's a bit of a cipher zoo going on, and I'd strongly suggest reducing the number of ciphers unless there are very strong technical arguments for doing otherwise. |
FWIW, I ported a C implementation of DJB's suggestion to Rust: https://github.com/vks/aesrng |
ChaCha20 (and even ChaCha8) is a lot slower than HC-128, which is why we went with that option. We considered HC-128 acceptable due to the ESTREAM recommendation and decided to remove ISAAC from Rand proper (hasn't happened yet but will). Making use of hardware AES in I don't plan to work on this myself but PRs are welcome. |
That's a bit surprising to hear, considering chacha8 and chacha12 are consistently faster than hc128 across multiple architectures on SUPERCOP (with chacha20 often beating it out as well): https://bench.cr.yp.to/results-stream.html (that said, ChaCha8 is pretty much zero margin for error, and I wouldn't recommend it as it has no safety margin. ChaCha12 is a happy medium between that and ChaCha20's paranoia) |
Maybe our implementation needs to be optimized/vectorized. |
If ChaCha can be optimised to compete that would be great. It uses a lot less memory and initialisation time (I guess this is why those benches show HC-128 as terrible on short sequences). |
I have wondered about those benchmarks before. Some of those benchmarks of HC-128 are even off by two orders of magnitude! Maybe it is also counting initialization time, combined with a terrible implementation? And using an implementation of ChaCha that is much faster than anything there is in Rust at the moment? |
I also tried the |
There isn't a particularly good implementation of ChaCha in Rust right now that I know of (which is why I plan on writing one soon). |
I wrote an explicitly vectorized ChaCha4 implementation for my |
I guess I missed the previous discussion of Randen, but it looks like a very nice option for platforms where hardware AES is available:
|
Yes [you did]: #462 Edit to clarify: Randen looks like a good RNG, but I'd want to see third-party cryptographic review before promoting it here. |
@dhardy I think this can be closed, now that we use a SIMD-optimized implementation of ChaCha? |
I guess. We still haven't examined Randen or forward secrecy, but these are separate topics. |
Status: proposal to allow hardware-dependent generators and replace HC128 with a faster variant of ChaCha (start reading here).
This topic comes up quite a bit from various angles; I think it's time to get some ideas down about the future of
thread_rng
(regarding the 0.6 release or later).I see the following potential uses for
thread_rng
:Also, I think it's worth mentioning where we do not expect
thread_rng
to be used:thread_rng
may not be the fastest optionAnd an important note on security: we should aim to provide a secure source of random data, but ultimately it is up to users to decide how much they trust our implementation and what their risks are.
thread_rng
does not have the simplest code to review and is currently young and subject to further change. Also we may or may not implement forward secrecy (backtracking resistance), and for ultimate security solutions using no local state may be preferred.Our current implementation of
thread_rng
tries to satisfy the above with a fast, secure PRNG, but at the cost of high memory usage and initialisation time per thread. For applications with a low number of long-running threads this is reasonable, but for many worker threads may not be ideal.There are two ways we can let users influence the implementation:
thread_rng
or call a different function)Feature flags allow configuration on a per-application basis, e.g.
The last two options sound very risky to me — should we ask distributors and end-users to reason about the security of whole applications? It is quite possible that the people building applications — even developers — will not know about all uses of
thread_rng
requiring secure randomness.This brings me to ask, is having only a single user-facing function ideal? What if instead:
strong_rng
replacesthread_rng
as a source of cryptographically secure randomnessweak_rng
is added as a fast source of randomness; depending on feature flags this could just wrapstrong_rng
or could be independentAn advantage of the above is that feature-flags could allow replacing the current implementation (HC-128; 4176 bytes) with two smaller back ends (e.g. ChaCha + PCG; 136 + 16 bytes), while only compromising the speed of the secure generator.
Another advantage is that we could add forward-secrecy to
strong_rng
with less concern for performance implications.But first, why bother when generators like Randen and
RDRAND
claim to satisfy all requirements anyway? This is a good question to which I only have vague answers: Randen is still new and unproven and may have portability issues; RDRAND is not fully trusted; and may not be the fastest option.Second, what about users like
HashMap
where weaknesses are often not exploitable (depending on application design and usage) and in the worst case only allow DOS attacks (slow algorithms)? Another good question. One possible answer is that these use-cases should useweak_rng
but by default this would be secure anyway; we provide feature flags to change that but discourage usage on servers. It might seem tempting to add a third function, but, frankly, this kind of thing is probably the main use case forweak_rng
anyway.Another, very different, option is that we keep
thread_rng
looking like it is but removeCryptoRng
support and recommend it not be used for crypto keys. Then we can add a feature flag changing its implementation to an insecure generator with less concern. This may be a good option, but goes against our recent changes (switching to HC-128 and implementingCryptoRng
).BTW, lets not bikeshed
thread_rng
vsThreadRng::new
or other syntax here.The text was updated successfully, but these errors were encountered: