Reseeding perf #76

pitdicker · 2017-12-16T19:54:15Z

I wanted to try inverting the counter in ReseedingRng as discussed in #59, but it turned out the performance was pretty bad to begin with.

Benchmarks before:

test reseeding_xorshift_bytes ... bench:     559,635 ns/iter (+/- 5,160) = 1829 MB/s
test reseeding_xorshift_u32   ... bench:       4,265 ns/iter (+/- 27) = 937 MB/s
test reseeding_xorshift_u64   ... bench:       4,893 ns/iter (+/- 30) = 1634 MB/s

After:

test reseeding_xorshift_bytes ... bench:     562,940 ns/iter (+/- 980) = 1819 MB/s
test reseeding_xorshift_u32   ... bench:       2,323 ns/iter (+/- 9) = 1721 MB/s
test reseeding_xorshift_u64   ... bench:       2,918 ns/iter (+/- 3) = 2741 MB/s

And plain Xorshift for comparison:

test gen_bytes_xorshift       ... bench:     555,592 ns/iter (+/- 10,734) = 1843 MB/s
test gen_u32_xorshift         ... bench:       1,372 ns/iter (+/- 12) = 2915 MB/s
test gen_u64_xorshift         ... bench:       2,643 ns/iter (+/- 28) = 3026 MB/s

I don't like several parts of the current design of ReseedingRng, but that is for another issue.

dhardy

It seems you never cease to find something to optimise!

Looks good other than the one thing. I suppose swapping the check, generate logic allows some degree of parallelism. Not sure what you mean about not liking the way it works though?

dhardy · 2017-12-17T09:07:21Z

src/reseeding.rs

+                if e.kind.should_wait() {
+                    // Delay reseeding
+                    self.bytes_until_reseed = self.threshold >> 8;
+                    break;


This sets bytes_until_reseed twice. Should it return instead?

I admit that the logic of this function is weird but I guess something like this is useful.

Hee, you're right.

pitdicker · 2017-12-17T19:32:33Z

Not sure what you mean about not liking the way it works though?

Some parts seem not very ergonomic. For example the new function takes an already existing RNG. Why does it need you to initialize the wrapped rng separately? If it knows how to reseed an rng, surely it can also initialize one?

The from_reseeder function does not make much sense to me. The main difference with new is that it comes with a default threshold. The DEFAULT_RESEEDING_THRESHOLD is very small and basically always wrong. The threshold value depends on the paranoia of the application/library, and on the wrapped rng, so I don't thing having a default is a good idea. Also it takes a fixed seed, instead of using the SeedFromRng trait.

And I didn't like the Reseeder trait. Why can't ReseedingRng just use some other Rng, without requiring a custom wrapper. That were my thoughts yesterday though, I may be coming around on the last issue...

pitdicker · 2017-12-17T19:33:34Z

I have been trying out a wild idea today to reduce the overhead of ThreadRng, which uses ReseedingRng. Because that seems like the commonly used interface, I don't really want it to look bad :-). In rand in the nursery its performance is only 50% of the RNG it wraps. After sprinkling some [inline]'s and removing some wrappers it gets to 80% of the RNG.

Is seems sensible to assume that if you want to reseed an RNG, it will probably be a cryptographic RNG, not some simple one. The variants I have seen so far all generate blocks of results, not one result at a time. One way to reduce the overhead of ReseedingRng is to do the reseed 'bookkeeping' only when a new block of results is generated, not with every next_*.

As a test I have added an RngCore trait:

pub trait RngCore: Sized {
    type Results: AsRef<[u32]>;

    fn init(seed: &[u32]) -> Self;
    fn generate(&mut self, results: &mut Self::Results);
    fn results_empty() -> Self::Results;
}

It just exposes the core algorithm form an RNG, and does not include the buffering etc necessary to implement the Rng trait. It does not really make the implementation of HC-128 less clean, and should also not be to hard too implement for ISAAC and ChaCha.

Because ReseedingRng now has access to the RNG's algorithm, it can only use that part and implement it's own buffering and bookkeeping. This should (not completely true for next_u32 yet) bring the overhead of ReseedingRng down to almost 0%, and bring the performance of ThreadRng within 90% of the wrapped RNG.

Do you think this a direction worth pursuing?

This is my current super ugly, many things comment out, WIP pitdicker@25dfbdd

dhardy · 2017-12-18T18:52:37Z

This trait requires the implementation to use u32 internally though, right? Can that be made a parameter? Also, I don't much like using a &[u32] seed; maybe you can do something like:

pub trait<T> RngCore: SeedableRng {
    type Results: AsRef<[T]>;

    fn init(seed: &<Self as SeedableRng>::Seed) -> Self;

Not quite sure what I think right now; this would make ReseedingRng only usable for certain classes of RNGs, right? But maybe that's an advantage allowing reseed to combine both current state (or output buffer) with a fresh seed instead of simply replace with a new seed. You're right, the wrapper is not very useful for fast, weak PRNGs.

pitdicker · 2017-12-18T20:13:23Z

I was already a little proud I got the extra abstraction working, including the Asref slice trick to be generic over arrays. But what you write is much better!

But maybe that's an advantage allowing reseed to combine both current state (or output buffer) with a fresh seed instead of simply replace with a new seed.

It is more that it combines the step to fill a new output buffer with the bookkeeping when to reseed the rng. Checking and managing counters can take almost as much time as filling the output buffer with fresh values, so every little thing we have to do less each step helps.

Not quite sure what I think right now; this would make ReseedingRng only usable for certain classes of RNGs, right?

I think I named it ReseedingBlockRng? I think nothing prevent using the current ReseedingRng with all kinds of RNG's. But this should be a faster alternative for what are basically CryptoRng. But if reseeding other kinds of RNG's does not make much sense, we could remove ReseedingRng.

What do you think about introducing an RngCore trait? In some way I find it a bit ugly because the main motivation is just to increase the performance of a reseeding wrapper. But because of TreadRng this may be worthwhile.

On the other hand IsaacRng, ChaChaRng and Hc128Rng now all have an impl block with the core algorithm, and this trait would make that a bit more formal. Maybe it can even become possible to share all the extra code for impl Rng between these implementations.

dhardy · 2017-12-18T20:37:34Z

Maybe the trait should be named BlockRng instead? But I think this means CSPRNGs would have to implement:

Rng
CryptoRng
BlockRng
SeedableRng
SeedFromRng

Is that a few too many traits? I'm wondering whether Rng should be implemented automatically for every BlockRng (perhaps via trait extension with default impls). Actually, if init is provided, then SeedableRng and SeedFromRng could be implemented automatically too? I'm not totally sure this will work, but it may simplify CSPRNG implementations.

pitdicker · 2017-12-18T20:43:15Z

Maybe there are tricks with traits I don't know, but I couldn't see a way to do this automatically. Rng implementations would have a struct that includes the struct from RngCore, an output buffer and counter/index. But a macro could certainly do it.

pitdicker · 2017-12-20T17:35:06Z

It turns out AsRef<[T]> is not implemented for [u32; 256], so that doesn't work as a bound. @dhardy do you happen to have an idea on how to have a generic Results type, but still be able to do something useful like index into it or getting the length?

pitdicker · 2017-12-20T18:38:49Z

Found a terrible workaround: use a newtype with AsRef and Deref implementations:

#[derive(Copy, Clone)]
pub struct IsaacArray([u32; RAND_SIZE]);
impl ::core::convert::AsRef<[u32]> for IsaacArray {
    fn as_ref(&self) -> &[u32] {
        &self.0[..]
    }
}
impl ::core::ops::Deref for IsaacArray {
    type Target = [u32; RAND_SIZE];
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
impl ::core::ops::DerefMut for IsaacArray {
    fn deref_mut(&mut self) -> &mut [u32; RAND_SIZE] {
        &mut self.0
    }
}

dhardy · 2017-12-21T10:26:45Z

Sorry, no, but I guess this is another thing that will be fixed by constant generics eventually.

pitdicker · 2017-12-24T11:17:19Z

@dhardy I now have this mostly working, and am cleaning up the changes. A lot of code has to move around... Implementing RNG's with the BlockRng trait looks pretty clean now, here for example HC-128. The overhead of ReseedingBlockRng with this method is not really measurable, as hoped.

As trait I now have:

pub trait BlockRng<T>: Sized {
    type Results: AsRef<[T]> + Default;

    fn generate(&mut self, results: &mut Self::Results);
}

But there is one place I am stuck with the traits: https://github.com/pitdicker/rand/blob/blockrng_part2/rand_core/src/lib.rs#L168. I want to add a BlockRngWrapper to implement Rng for a BlockRng. And it needs a separate implementation for those that return [u32] and [u64].

The implementation for [u64] is commented out at the moment, otherwise I get the error:

error[E0119]: conflicting implementations of trait `Rng` for type `BlockRngWrapper<_, _>

Do you know how to fix this?

dhardy · 2017-12-27T11:20:57Z

Merry Christmas @pitdicker. I had a look, and I think the problem is that it would be possible for some R to implement both BlockRng<u32> and BlockRng<u64> since essentially they are separate traits. Perhaps it would be useful to prevent a type implementing both traits, but I'm not sure if that's possible.

It might be possible to use specialisation somehow by making BlockRng<u32> more general than BlockRng<u64> (i.e. the latter extends the former).

Alternatively you could just use two separate BlockRngWrapper traits.

BTW I think the names would be better like this:

BlockRng → BlockRngCore
ChaCha → ChaChaCore
BlockRngWrapper → BlockRng

pitdicker · 2017-12-28T09:13:02Z

Thank you! And I like the names you listed better.

essentially they are separate traits

Ah, that explains it. Back to the drawing board then. The trait system must be logical, I should really get a handle on it, but don't know of a good resource...

My hope was to end up with something like:
ReseedingBlockRng > BlockRng > ReseedingCore > (ChaChaCore, Isaac64Core, etc.)
Two traits would help with implementing RNG's, but not with the reseeding mechanism?

I think these changes are only worth it if the end results is reasonably clean, and am starting to give up.
On the other hand the not directly perfect code for ReseedingBlockRng I have now is a win for anything except ISAAC-64.

dhardy · 2017-12-28T11:26:51Z

@pitdicker the logic you want is essentially this:

pub trait A {}  // BlockRng<u32>
pub trait B {}  // BlockRng<u64>
pub trait T {}  // BlockRngWrapper

impl<X: A> T for X {}
impl<X: B> T for X {}

The compiler won't allow the second impl because if an X were to implement both, that type would have two impls for T. I don't think it's possible to tell the compiler no type can implement both A and B. Alternatively it would be nice to say impl<X: B> T for X where not X: A {}, but I don't think that's possible either.

There's a workaround: specialization allows multiple implementations, so long as one is more specific than the other. But that's not stable yet. Example.

BTW feel free to bring this up on https://internals.rust-lang.org/ but I doubt there will be any rapid progress on it. You may also like N. Matsakis's blog, but it's a bit off-topic here.

Edit: found something related

dhardy · 2017-12-31T11:53:29Z

So @pitdicker should I merge this PR while the BlockRng thing is left on the side for now? If so you could open a tracking issue for that.

pitdicker · 2017-12-31T12:25:43Z

Yes, I think merging this is a good idea.

I am not really sure how to proceed with the BlockRng idea. What do you think about implementing it for 32-bit RNG's for now, with the possibility open to extending it to 64 bit once specialisation is ready?

I somewhere messed up this branch but will fix it in a moment.

* Move the check if it is time to reseed out of the `try_reseed_if_necessary` and make sure that function does not get inlined. * Invert the counter direction. This way we can compare against 0 instead of `self.threshold` * Doing the reseed check after generating a value turns out to be a bit faster.`

dhardy · 2017-12-31T13:20:53Z

So for now 64-bit block RNGs would not use BlockRng? How then will ReseedingRng work on StdRng? I guess if we switch StdRng to HC-128 first that doesn't matter so much.

pitdicker · 2017-12-31T13:38:21Z

That is the idea 😄. And else it can still work and be faster with the ReseedingRng from this PR.

Travis seems very busy today...

dhardy · 2017-12-31T14:00:37Z

Yes it is. I wonder if it's something to do with the new year?

pitdicker · 2017-12-31T14:28:43Z

O wow, maybe... But ready after all.

dhardy

After another look, I think it would be worth changing the benchmarks. It will probably reduce the apparent impact of your improvement, but is more realistic and pertinent, especially when considering BlockRng later.

dhardy · 2017-12-31T15:02:17Z

src/reseeding.rs

@@ -44,13 +43,14 @@ impl<R: Rng, Rsdr: Reseeder<R>> ReseedingRng<R, Rsdr> {
    /// # Arguments
    ///
    /// * `rng`: the random number generator to use.
-    /// * `generation_threshold`: the number of bytes of entropy at which to reseed the RNG.
+    /// * `threshold`: the amount of generated bytes after which to reseed the RNG.


I believe "the number of [generated] bytes" is correct English; amount is typically used for "uncountable" things (e.g. water, money, food). But I'm not really fussed (I know I accepted something similar recently anyway).

Please keep correcting my English, It is not my first language and it is better if the language in the documentation is correct.

dhardy · 2017-12-31T15:04:02Z

benches/generators.rs

+
+#[bench]
+fn reseeding_xorshift_bytes(b: &mut Bencher) {
+    let mut rng = ReseedingRng::new(XorShiftRng::new().unwrap(),


Wouldn't it make more sense to benchmark the PRNG we're interested in (ISAAC or HC128)? Especially given that your BlockRng idea integrates tighter with the PRNG algorithm.

With HC-128 the overhead of reseeding is much larger:

test gen_bytes_hc128 ... bench: 445,483 ns/iter (+/- 16,674) = 2298 MB/s test gen_u32_hc128 ... bench: 2,809 ns/iter (+/- 194) = 1423 MB/s test gen_u64_hc128 ... bench: 4,254 ns/iter (+/- 360) = 1880 MB/s test init_hc128 ... bench: 4,539 ns/iter (+/- 412) test reseeding_hc128_bytes ... bench: 451,584 ns/iter (+/- 25,463) = 2267 MB/s test reseeding_hc128_u32 ... bench: 3,690 ns/iter (+/- 125) = 1084 MB/s test reseeding_hc128_u64 ... bench: 5,907 ns/iter (+/- 157) = 1354 MB/s

And before this PR:

test reseeding_hc128_bytes ... bench: 449,635 ns/iter (+/- 5,755) = 2277 MB/s test reseeding_hc128_u32 ... bench: 6,418 ns/iter (+/- 84) = 623 MB/s test reseeding_hc128_u64 ... bench: 7,693 ns/iter (+/- 124) = 1039 MB/s

This makes sense, because the overhead of checking and indexing in the results array makes up 20~40%. With ReseedingRng that percentage gets doubled because it also does checks for reseeding.

You are right, this benchmark is testing something nonsensical. Reseeding Xorshift ?!. I don't think it matters much because they both show the overhead of ReseedingRng, but I'll change it.

pitdicker force-pushed the reseeding_perf branch from 906cf65 to 08ecec4 Compare December 16, 2017 19:55

dhardy reviewed Dec 17, 2017

View reviewed changes

dhardy mentioned this pull request Dec 28, 2017

Mutually exclusive traits rust-lang/rfcs#1148

Closed

pitdicker added 3 commits December 31, 2017 13:35

Add benchmarks for ReseedingRng

402e673

Simplify reseeding erro logic

07717bc

pitdicker force-pushed the reseeding_perf branch from ea3951c to 07717bc Compare December 31, 2017 12:37

dhardy reviewed Dec 31, 2017

View reviewed changes

Benchmark reseeding HC-128 instead of Xorshift

14f02a2

dhardy merged commit d5d9c75 into dhardy:master Jan 1, 2018

pitdicker deleted the reseeding_perf branch January 1, 2018 11:43

pitdicker mentioned this pull request Jan 2, 2018

To provide (or not) specific PRNGs? #58

Open

pitdicker mentioned this pull request Feb 2, 2018

Port reseeding rust-random/rand#252

Merged

pitdicker mentioned this pull request Mar 3, 2018

Add BlockRng abstraction rust-random/rand#281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reseeding perf #76

Reseeding perf #76

pitdicker commented Dec 16, 2017

dhardy left a comment

dhardy Dec 17, 2017

pitdicker Dec 17, 2017

pitdicker commented Dec 17, 2017

pitdicker commented Dec 17, 2017

dhardy commented Dec 18, 2017

pitdicker commented Dec 18, 2017

dhardy commented Dec 18, 2017

pitdicker commented Dec 18, 2017

pitdicker commented Dec 20, 2017

pitdicker commented Dec 20, 2017

dhardy commented Dec 21, 2017

pitdicker commented Dec 24, 2017

dhardy commented Dec 27, 2017

pitdicker commented Dec 28, 2017

dhardy commented Dec 28, 2017 •

edited

Loading

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy left a comment

dhardy Dec 31, 2017

pitdicker Dec 31, 2017

dhardy Dec 31, 2017

pitdicker Dec 31, 2017 •

edited

Loading

Reseeding perf #76

Reseeding perf #76

Conversation

pitdicker commented Dec 16, 2017

dhardy left a comment

Choose a reason for hiding this comment

dhardy Dec 17, 2017

Choose a reason for hiding this comment

pitdicker Dec 17, 2017

Choose a reason for hiding this comment

pitdicker commented Dec 17, 2017

pitdicker commented Dec 17, 2017

dhardy commented Dec 18, 2017

pitdicker commented Dec 18, 2017

dhardy commented Dec 18, 2017

pitdicker commented Dec 18, 2017

pitdicker commented Dec 20, 2017

pitdicker commented Dec 20, 2017

dhardy commented Dec 21, 2017

pitdicker commented Dec 24, 2017

dhardy commented Dec 27, 2017

pitdicker commented Dec 28, 2017

dhardy commented Dec 28, 2017 • edited Loading

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy commented Dec 31, 2017

pitdicker commented Dec 31, 2017

dhardy left a comment

Choose a reason for hiding this comment

dhardy Dec 31, 2017

Choose a reason for hiding this comment

pitdicker Dec 31, 2017

Choose a reason for hiding this comment

dhardy Dec 31, 2017

Choose a reason for hiding this comment

pitdicker Dec 31, 2017 • edited Loading

Choose a reason for hiding this comment

dhardy commented Dec 28, 2017 •

edited

Loading

pitdicker Dec 31, 2017 •

edited

Loading