Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: new rand-core crate, rand adaptations #2152

Closed
wants to merge 23 commits into from

Conversation

dhardy
Copy link
Contributor

@dhardy dhardy commented Sep 15, 2017

This RFC seeks to introduce a rand_core trait, and make a few changes to rand for compatibility.

Rendered link

Parent RFC — for reference on the development of this RFC

Still undecided in this RFC:

  • Semantic meaning of CryptoRng
  • Error type
  • SeedableRng stream support
  • Trait for seek support [can wait until later]
  • Add next_u128?

@manuels
Copy link

manuels commented Sep 15, 2017

I want to suggest to rename the Rng trait to NonCryptoRng, so you'd have CryptoRng and NonCryptoRng.
I have seen to too many people out there using int rand(void) for cryptographic purposes and making it right in Rust really comes with low cost.

@Lokathor
Copy link
Contributor

I know that some Rngs can seek, but how many of them? The seeking trait seems like an easy thing to add later if we can't come to a quick conclusion. Individual rngs are also allowed to provide methods beyond just the traits as well, until then.

Can the Error type be an associated type? That way each rng can pick their own enum of how they wish to fail. That would only work with design 2 or design 3, but it seems good if either of those are picked.

And for default methods, if we use options 2 or 3, then next_u64 can default to two next_u32 calls. We would not need a default for try_fill, but we could also have that default to filling the bytes with next_u64 calls I suppose.

@dhardy
Copy link
Contributor Author

dhardy commented Sep 15, 2017

@manuels I know it sounds cavalier, but I'm not so concerned, since (a) the documentation should recommend use of OsRng which is the best option, (b) the "convenient" generator, thread_rng, may not be a recommended crypto RNG but it will still be something for which there is no known practical attack, and (c) in the worst-case when a very insecure generator is used against all recommendations, it should still be seeded from OsRng, so there will still be as much entropy as the internal state can hold. That said, CryptoRng will be right next to Rng, so should be plenty visible enough? It's not like C where you're supposed to use a completely different part of the library.

@Lokathor "Can the Error type be an associated type?" Technically I guess yes, except for uses not on Rngs. But practically, it would be a huge pain to work with (multiple error types, much more explicit speification needed).

@saethlin
Copy link
Member

I think the comment by @manuels is in line with Rust's tendency to make unsafe names longer, and dangerous-sounding. It's hard to miss that you've done something less safe.

@burdges
Copy link

burdges commented Sep 16, 2017

We figured out that Rngs never have many errors @Lokathor except perhaps for OsRng but even there they denote system miss-configuration and a &'static str suffices. Imagine if say JoeBSD handles the system PRNG poorly by providing only an entropy pool that quickly gets exhausted then that's a bug in JoeBSD but if they won't fix it then OsRng should wrap the system PRNG on JoeBSD inside a CSPRNG that periodically reseeds to prevent exhaustion. It's similar if you've some fancy collaborative randomness daemon running.

I noticed the existing inherent method for seek on ChaChaRng also sets the nonce. How should one set the nonce on ChaChaRng if we use SeekableRng? Just another inherent method? Also, is another SeekableRng planned? If not, then maybe it's more important to expose inherent methods that do ChaChaRng well.

@newpavlov
Copy link
Contributor

newpavlov commented Sep 18, 2017

I feel the best design would be something like this. (your designs cover it only partially) The main advantage that we naturally solve CryptoRng error problem with zero friction for no_std crates. (also OsRng will be able to pass io::Error) Additionally type system will know if RNG is indeed infallible, so users will be able to write R: CryptoRng<Error=!> if deemed necessary.

But the big disadvantage is that it relies on two currently unstable features. As written in the RFC we can replace never type with blank enum, but I don't think we should stabilize rand with this design without stabilization of those both features. Other small issue is that implementing fallible RNG will be a bit inconvenient and the best approach would probably to use private RNG method for filing bytes which will be used for fill and try_fill implementations. But I think this issue is quite insignificant. Also if I recall correctly there were concerns that associated type will make it harder to use CryptoRng, but I couldn't come up with examples which will support this opinion.

Regarding default implementations, I started to think that it will be better to include default implementation for fill in terms of next_u64 and leave both next_u32 and next_u64 as required methods, but I guess it's a matter of preference.

About next_u128: if I am not mistaken we'll have to keep the respective feature flag even after u128 stabilization or somehow convey that this feature flag is unstable. Is it worth the trouble? Shouldn't we simply wait u128 stabilization? Hopefully we'll get it relatively soon. (AFAIK only this issue is blocking it)

Question about SeedFromRng, it's not quite clear from the RFC what kind of Result are you using there, best to clarify it.

Also please clarify the naming: do you suggest to use rand-core, rand-os, etc. or rand_core, rand_os, etc.? (in the beginning of RFC you use rand-core, but in other places rand_core) I think for crate names we should use hyphen, as AFAIK it is more common compared to the underscore.

@pitdicker
Copy link
Contributor

I noticed that the i128-support feature flag is not very ergonomic for other crates that implement an RNG. Every one of them now has to add a similar feature flag in cargo.toml, at the top level file, and in the implementations. next_u128 should at least provide a default implementation. This seems sensible to me anyway, as an implementation that is more optimal than combining two u64's (e.g. one that operates natively on u128's) should be rare.

@pitdicker
Copy link
Contributor

My thoughts about the SeedableRng trait:
What use cases are we serving with reseed? Simple deterministic RNGs used in games etc. that should produce a specific sequence? Then it seem not very useful to mel, as than it should be just as fast and ergonomic to use from_seed.

If we are talking about stronger RNGs, or even cryptographic ones, there is sometimes the suggestion to reseed the RNG. That may just be overly cautious, but that is not for me to decide. In that case it would be best if this function was more similar to from_rng, taking an other RNG (OsRng) as a source.

I think there is nothing lost if reseed is just removed.

@dhardy
Copy link
Contributor Author

dhardy commented Sep 19, 2017

next_u128: agree, lets remove and (possibly) add after stabilisation with default impl

Removing reseed: good argument

- vs _ in crate names: no idea. I was trying to follow this change. They should have just banned - from the beginning IMO (to line up with in-lang identifiers).

Associated error type (@newpavlov): seems excessively complicated to me. Also "whether an RNG is fallible" seems pretty unimportant, given that error rates will be extremely low. Finally, you wrote a generic function, get_key, to solve a simple problem... what happens when you have multiple sources of error to handle (e.g. creating and using a PRNG seeded from OsRng)? Having a uniform error type makes this much simpler.

@newpavlov
Copy link
Contributor

newpavlov commented Sep 19, 2017

@dhardy
OsRng and external RNGs will simply use io::Error (so e.g. in NewSeeded you will just use io::Result for new method), if you fix RNG types you fix error types as well, so working with them will be no different from the usual Rust code. It could make generic code a bit more verbose, but I think it's better to encode information into type system if we can and allow crate authors to choose, than use plain &'static str, or even complicated constructs described in the RFC. Also I very much doubt application which use two fallible RNGs will be common enough to even consider such use cases. Could you please provide examples in which associated error type will introduce unwanted complications?

As for names, I think rand_derive was initially published with underscore, so it's not a renaming of the crate, but a source fix to the correct name. We have example of crates around tokio and they all use hyphen.

@burdges
Copy link

burdges commented Sep 19, 2017

Initially I'd imagined SeedableRng::reseed produced a new internal state by combining the current internal state with external state, but that's often false and thus confusing.

I kinda wondered if SeedableRng::reseed is required because (a) if R : Rng then &mut R : Rng so (b) ReseedingRng must operate on &mut R, but no this strange Reseeder<R> trait avoids that and just does *rng = assignments anyways.

I suppose SeedableRng::reseed might improve constructs similar to ThreadRng but currently I do not see how.

@pitdicker
Copy link
Contributor

pitdicker commented Sep 19, 2017

Support for seeding from an u64 (SeedableRng<64>) seems like a good idea to me.

Seeding with an u64 should result in a RNG with a chance of 2^64 of being different from one other generator. If we look at the birthday problem, there is a 50% chance of getting the same generator a second time after 2^32 random initializations. This drops quickly to only a couple of percents for 2^30. These numbers are very large, you would need about 8gb to just store the state of all the generators!. So initializing with an u64 seems good enough for almost any purpose to me.

Initializing from an u32 would give a change of ~1% to get the same generator after ~2^14 (not checked the numbers here). So after about 15.000 initializations. If we want to support SeedableRng as a generally useful way to create a new RNG, and not only as a way to get a reproducible RNG for games and simulators etc., u32 would be to small.

Initializing with an u64 instead of as many bits as a RNGs state would also allow one to spare the OsRng a bit, or some other generator used for initialization. If the source generator returns really random numbers, I have seen the suggestion to treat the numbers returned by it as precious. But current OS generators don't fall in this category (unlike something like [www.random.org]), so this is not a strong point.

A disadvantage is that it makes it more difficult to write a correct initialization routine, but I think this is for most generators a solved problem.

This are all not very important arguments. The arguments in the RFC to include it for reproducibility are best.

Seeding by slices:
I don't agree that seeding a part of the state with 0's is perfectly valid. Xorshift for example would be significantly weakened by it. And while ISAAC has a complex initialization routine, I would not trust one initialized with a (large) number of zeros. I would suggest a panic if the slice is smaller than the required amount.

I do like the proposals for an associated type or fn seed_len() -> u32. Of byte-slices I am not really a fan. They are a bit slower to generate, and need to be transmuted / casted to be consumed. I think all generators store their state using 32 or 64-bit integers.

I don't really see a use case for implementing SeedableRng for multiple types, that is not served by seeding from an u64 and supplying a slice of the full seed size.

So my preferred trait would look like:

pub trait SeedableRng<Seed>: Rng {
    fn seed_from_u64(&mut self, seed: u64);
    fn seed(&mut self, Seed);
    fn seed_len() -> u32;
}

@pitdicker
Copy link
Contributor

@burdges I don't fully understand what you are saying about reseeding, can you explain it some more?
And do you happen no know an RNG algorithm that can add entropy to its state?

On the other hand, if there is such an RNG, it can just as well expose such a function without it being part of the SeedableRng treat. Because I don't think it is all that common. But I have not looked all that hard at the different CSPRNG's yet.

@dhardy
Copy link
Contributor Author

dhardy commented Sep 19, 2017

@pitdicker "adding entropy" is pretty easy: just XOR the current state with new bits. Obviously there is a maximum amount of entropy the state can hold, so any definition of "add entropy" for a fixed size type must allow wastage. The point is simply that if somehow the "new entropy" is not fully trusted but the state was previously unknown, the new state is still unknown.

About your SeedableRng trait, in that case why not use an associated type? I don't see any advantage to your version if allowing only a single seed type, nor do I see a use for seed_len if the type is known:

pub trait SeedableRng: Rng {
    type Seed;
    fn from_seed(&mut self, Self::Seed);
    fn seed_from_u64(&mut self, seed: u64);
}

@newpavlov take a look at this. It already has to deal with two different sources of errors (OsRng and Self: SeedFromRng). How's that work with your associated error types?

@pitdicker
Copy link
Contributor

@dhardy you are completely right :-)

@pitdicker
Copy link
Contributor

pitdicker commented Sep 19, 2017

Let me say first: It is great to see some traits proposed for new functionality like seeking and streams!

The biggest advantage they offer is the ability to create new RNGs of the same type, with no overlap, and without the need to get a new seed (from something like the OS). I think this is something very useful.

The link in the RFC to the PCG blog gives some great background info. Creating a new RNG from an existing one of the same type is something that I think is always possible.
For Xorshift, PCG and ChaCha it can be done by seeking to a point far enough away in the stream.
For ISAAC running a hash over its state should scramble it so much that it is basically guaranteed to be at a whole different point in its stream. Given that its period is huge, this should be fine.
PCG offers special support for multiple generators with the same seed, by changing the constants used internally.

I would not change the traits to much to accommodate for the design of PCG.

I don't know if this is possible, but I would implement it as part of the NewSeeded trait:

impl<XorshiftRng> NewSeeded for XorshiftRng {
    fn new() -> Result<Self> {
        // some jump function, next stream counter, hash etc.
    }
}

I was reasonably enthusiastic about seeking, until looking into it some more.
Seeking by a variable amount is super difficult for simple generators like Xorshift, Mersenne Twister etc. This paper describes the best method. A fixed amount is easier, because the jump constant can be generated by some computer algebra package.

I have not looked into it to deeply, but ChaCha seems to support jumping from an absolute position, and PCG only from the current position (unless you store the initial state somewhere).

One other thing that could be a problem: the amount it is possible to seek, is the same as the size of the period of the RNG, and maybe even larger (wrapping around). What to do with RNGs with periods like 128-bits, 256, 1024 or even larger? On the other hand, jumps that large seem only useful for the PCG party tricks... For any normal use jumping by an u64 seems sufficient (just like a period of 2^64 seems sufficient for a single stream of an RNG).

Next I tried to come up with some use cases. I don't have any real ones yet. It is something I have not seen other random libraries in other programming languages expose.

You already question if a trait for seek support should be included. I would say, lets wait until we have a few implementations to see what the possibilities of different RNGs are.

@dhardy
Copy link
Contributor Author

dhardy commented Sep 19, 2017

@pitdicker the only use-case I can personally see for seeking is to get a different stream (I think this is basically what set_counter is there for). And party tricks...

As for multiple streams, there are a bunch of ways to make them, as you say. I came up with a use-case for multiple streams a long time back: threading in agent-based sims where reproducibility is important. Given hundreds to millions of agents each undergoing stochastic processes and each needing a CPU-intensive update each step of the simulation, it is desirable to distribute the updates between CPU cores by giving each agent (or group of agents) its own PRNG; this could work several different ways (new agents deriving from a parent agent or new agents generated in-order by a master generator) and still be reproducible.

@pitdicker
Copy link
Contributor

I agree, a big plus for multiple streams. Seeking by arbitrary amounts not so much yet. What do you think about using NewSeeded for streams?

@dhardy
Copy link
Contributor Author

dhardy commented Sep 19, 2017

I'm not sure; your impl<XorShiftRng> example isn't correct Rust code (only unbound parameters go there).

I guess you mean to use specialisation to provide another way of constructing each generator from itself. This would require the NewSeeded trait to be imported in crates defining new RNGs; I specifically designed it not to require that. Also RNG implementations might forget to define this, thus getting the default impl — though often that would be okay I guess.

@pitdicker
Copy link
Contributor

Then a seperate trait would be better, something like this?

pub trait MultipleStreamsRng: Rng {
    fn new_stream(rng: &mut R) -> Self;
}

@burdges
Copy link

burdges commented Sep 19, 2017

We should probably work out how ThreadRng works under the constraints that (a) the machinery used should be generic enough to exposed, and (b) it should run reasonably quickly, like chcha speed.

At first blush, we seemingly need this Reseeder mess in https://doc.rust-lang.org/rand/src/rand/lib.rs.html#904 because you cannot just pass OsRng as a type parameter. You need some marker type instead because you want the ReseedingRng machinery to work off another user space PRNG as well. If we cannot ditch Reseeder then maybe the SeedableRng::reseed function sounds less valuable, but..

We could make this all resemble hashers more closely and pump some life back into reseed.

pub trait BuildRng { // Renamed Reseeder 
    type Rng: Rng;
    fn buld_rng(&mut self) -> Self::Rng;
}

pub struct OsRngBuilder;  // Renamed and exposed ThreadRngReseeder
impl BuildRng for OsRngBuilder {
    type Rng = OsRng;
    fn buld_rng() { OsRng::new() }
}
impl<'a,R: Rng> BuildRng for &'a mut R {  // Permit Rngs to be used to build directly
    type Rng = &'a mut R;
    fn buld_rng(&mut self) { &mut *self }
}

We could maybe give SeedableRng control over when reseeding happened using a closure :

pub trait SeedableRng: Rng {
    type Seed: Rand;
    fn from_seed(Self::Seed) -> Self;
    fn from_rng<R: Rng>(rng: &mut R) -> Self { Self::from_seed(rng.gen()) }
    fn from_crypto_rng<R: CryptoRng>(rng: &mut R) -> Result<Self,R::Error> { Self::from_seed(rng.try_gen()?) }

    type Boost: Rand = u64; 
    fn boost_entropy<F: FnOnce() -> Boost>(&mut self, entropy_booster: F) -> bool {
        panic!("boost entropy unimplemented")
    }
}

ChaChaRng boosts if its counter exceeds some constant. PRNGs that do not use a counter cannot know when to boost, but if any exist then some wrapper could give them a counter. We could split SeedableRng into two separate traits to give compile time errors instead of this panic. Also, I'm using the name boost_entropy because the word reseed is flying around in several contexts, but maybe offer_reseed would be the final name or something. We need a nice prefix like do_ for "maybe do it if you want to". In any case, we can now define ReseedingRng without worrying much about the details of either Rng involved :

pub struct ReseedingRng<R, S> where 
    R: SeedableRng, 
    S: OsRngBuilder,
    // These last two are unneeded since we put them above 
    // <R as SeedableRng>::Seed: Rand,
    // <R as SeedableRng>::Boost: Rand,
{
    rng: R,
    src: S,
}
impl<R, S> ReseedingRng<R,S> {
    pub fn new(src) -> ReseedingRng<R,S> {
        let rng = R::from_rng(&mut src.buld_rng()) );
        ReseedingRng { rng, src }
    }
    pub fn boost_entropy(&mut self) {
        self.rng.boost_entropy(|| { self.src.build_rng().gen() })
    }
}
impl<R, S> Rng for ReseedingRng<R, S> where ... { ... }

type ThreadRngInner = ReseedingRng<ChaChaRng, OsRngBuilder>; 
pub struct ThreadRng {
    rng: Rc<RefCell<ThreadRngInner>>,
}

I donno if using a closure for flow control like that will always be as efficient as folks want, but presumably yes once it's wrapped and monomorphized inside ReseedingRng.

@manuels
Copy link

manuels commented Sep 19, 2017

@pitdicker wrote:

The link in the RFC to the PCG blog gives some great background info. Creating a new RNG from an existing one of the same type is something that I think is always possible.
For Xorshift, PCG and ChaCha it can be done by seeking to a point far enough away in the stream.

I disagree! When dealing with a CSPRNG you should reseed the new RNG! If you just seek, the two RNG are correlated and somebody might be able (maybe by just reading enough information from the first RNG) to get information about the state of the other RNG.

Additionally, I have just heard arguments against renaming Rng to Insecure/Simple/NonCryptoRng but not a single argument why you should keep the name Rng. Sorry, but saying we will put the distinction between Rng and CryptoRng in the docs is not enough from my point of view.
What should be the benefit of calling the insecure RNG just Rng? It is just sparing you from typing 9 letters (NonCrypto).

Just put

import rng::NonCryptoRng as Rng;

in your rust file and everybody knows with what kind of RNG we are dealing with.
I simply do not see any benefit from calling the non-crypto RNG Rng. It is just a potential for security flaws.

@nagisa
Copy link
Member

nagisa commented Oct 21, 2017

The point of both my examples is that (a vocal minority of) people are increasingly interested in ability to handle errors, even if they are unlikely to happen (or only happen in uncommon configurations).

I’m re-reading the updated RFC right now.

@dhardy
Copy link
Contributor Author

dhardy commented Oct 21, 2017

Uh, I'm updating the RFC right now...

The issues shows some of the latest stuff, as well as this PR (first comment is out of date).

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the new error kind enum.

*For now*, re-export all the above in the `rand` crate. Also add two things to
`rand`:

* the `NewSeeded` trait and its implementation for `SeedFromRng`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is slightly confusing. You don’t implement trait for a trait. Was this sentence supposed to read as one of the following?

  • a trait NewSeeded: SeedFromRng;
  • NewSeeded, a supertrait of SeedFromRng.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then

the NewSeeded trait and its (blanket) implementation for SeedFromRng implementors

would be less ambiguous.


* How end-users should use `rand` (for now, the assumption is that all
necessary items will be re-exported through `rand` to minimise breakage;
this may change but will be the subject of a follow-up RFC, not this one)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest deciding in this RFC whether we’re interested in keeping the reexport or would rather make the reexports deprecated somehow. There’s two sides to this. First, if we are interested in deprecation, it should land together with the split so hopefully we don’t have to go through the big breaking churn twice. Second, this would draw the guideline for the external RNG implementations. If those do “stuff” wrong, without being aware of our intentions for the rand/rand-core/etc, it will make churn ever so slightly worse.

I personally think re-exporting rand-core traits from rand (and crates implementing RNGs in general) is a significant usability boost – you wouldn’t need to extern crate two crates and be able to do extern crate nicerng::{NiceRng, Rng} for any RNG implementor you happen to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree with you, pretty well, and created this issue for discussion.

let mut r = OsRng::new()?;
Self::from_rng(&mut r)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such implementation essentially makes it impossible to implement NewSeeded by anybody else for any other seedable RNG.

That is, it is impossible to write something along the lines of

struct DifferentlySeededRng(MyRng);
impl SeedFromRng for DifferentlySeededRng {...}
impl NewSeeded for DifferentlySeededRng {
    fn new() -> Result<Self> { ... }
}

due to coherence, which makes this trait hardly useful IMO. If we’re adding a trait+impl like this, we should aim to make fn new a default fn new as soon as specialisation is stable… but I personally think something like this:

impl OsRng {
    fn seed_new<R: SeedFromRng>() -> Result<R>;
}

is potentially way better and is a pattern that’s extensible to any number of seed-capable RNGs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively a plain function could exist in rand without any of that unnecessary trait stuff.

Copy link
Contributor Author

@dhardy dhardy Oct 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait I suggested is there to allow this:

use rand::NewSeeded;
let rng = SomePRNG::new();

I suppose let rng = OsRng::seed_new::<SomePRNG>(); isn't a terrible alternative.

To be honest, I don't see why anyone would need to write an alternative implementation for NewSeeded though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up comments in this issue please.

@jethrogb
Copy link
Contributor

cc @briansmith

@aturon
Copy link
Member

aturon commented Feb 1, 2018

@dhardy I wanted to check in on this and the related RFC. I haven't tracked the rand work closely-- are these RFCs still relevant?

@dhardy
Copy link
Contributor Author

dhardy commented Feb 1, 2018

@aturon no, we're not really working from the RFCs now. Some of the plans are the same, some have changed; we're getting feedback via PRs now which has the advantage of more detail but perhaps the disadvantage of less audience.

Since rand is outside the core library anyway I'm not sure these RFCs are really in the right place anyway. At some point there may be discussion of whether std will use rand instead of its internal copy, but that's a different issue (unless it has ramifications on how rand should be structured, in which case please weigh in).

@aturon
Copy link
Member

aturon commented Feb 1, 2018

@dhardy OK, that's pretty much as I had assumed, and is totally fine. It's wonderful to see rand making as much progress as it has!

For the time being, I'm going to close out these RFCs. Those interested in rand's development can follow along on its repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.