-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reseeding perf #76
Reseeding perf #76
Conversation
906cf65
to
08ecec4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you never cease to find something to optimise!
Looks good other than the one thing. I suppose swapping the check, generate logic allows some degree of parallelism. Not sure what you mean about not liking the way it works though?
src/reseeding.rs
Outdated
if e.kind.should_wait() { | ||
// Delay reseeding | ||
self.bytes_until_reseed = self.threshold >> 8; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets bytes_until_reseed
twice. Should it return instead?
I admit that the logic of this function is weird but I guess something like this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hee, you're right.
Some parts seem not very ergonomic. For example the The And I didn't like the |
I have been trying out a wild idea today to reduce the overhead of Is seems sensible to assume that if you want to reseed an RNG, it will probably be a cryptographic RNG, not some simple one. The variants I have seen so far all generate blocks of results, not one result at a time. One way to reduce the overhead of As a test I have added an pub trait RngCore: Sized {
type Results: AsRef<[u32]>;
fn init(seed: &[u32]) -> Self;
fn generate(&mut self, results: &mut Self::Results);
fn results_empty() -> Self::Results;
} It just exposes the core algorithm form an RNG, and does not include the buffering etc necessary to implement the Because Do you think this a direction worth pursuing? This is my current super ugly, many things comment out, WIP pitdicker@25dfbdd |
This trait requires the implementation to use pub trait<T> RngCore: SeedableRng {
type Results: AsRef<[T]>;
fn init(seed: &<Self as SeedableRng>::Seed) -> Self; Not quite sure what I think right now; this would make |
I was already a little proud I got the extra abstraction working, including the Asref slice trick to be generic over arrays. But what you write is much better!
It is more that it combines the step to fill a new output buffer with the bookkeeping when to reseed the rng. Checking and managing counters can take almost as much time as filling the output buffer with fresh values, so every little thing we have to do less each step helps.
I think I named it What do you think about introducing an On the other hand |
Maybe the trait should be named
Is that a few too many traits? I'm wondering whether |
Maybe there are tricks with traits I don't know, but I couldn't see a way to do this automatically. |
It turns out |
Found a terrible workaround: use a newtype with #[derive(Copy, Clone)]
pub struct IsaacArray([u32; RAND_SIZE]);
impl ::core::convert::AsRef<[u32]> for IsaacArray {
fn as_ref(&self) -> &[u32] {
&self.0[..]
}
}
impl ::core::ops::Deref for IsaacArray {
type Target = [u32; RAND_SIZE];
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl ::core::ops::DerefMut for IsaacArray {
fn deref_mut(&mut self) -> &mut [u32; RAND_SIZE] {
&mut self.0
}
} |
Sorry, no, but I guess this is another thing that will be fixed by constant generics eventually. |
@dhardy I now have this mostly working, and am cleaning up the changes. A lot of code has to move around... Implementing RNG's with the As trait I now have: pub trait BlockRng<T>: Sized {
type Results: AsRef<[T]> + Default;
fn generate(&mut self, results: &mut Self::Results);
} But there is one place I am stuck with the traits: https://github.com/pitdicker/rand/blob/blockrng_part2/rand_core/src/lib.rs#L168. I want to add a The implementation for
Do you know how to fix this? |
Merry Christmas @pitdicker. I had a look, and I think the problem is that it would be possible for some It might be possible to use specialisation somehow by making Alternatively you could just use two separate BTW I think the names would be better like this:
|
Thank you! And I like the names you listed better.
Ah, that explains it. Back to the drawing board then. The trait system must be logical, I should really get a handle on it, but don't know of a good resource... My hope was to end up with something like: I think these changes are only worth it if the end results is reasonably clean, and am starting to give up. |
@pitdicker the logic you want is essentially this: pub trait A {} // BlockRng<u32>
pub trait B {} // BlockRng<u64>
pub trait T {} // BlockRngWrapper
impl<X: A> T for X {}
impl<X: B> T for X {} The compiler won't allow the second impl because if an There's a workaround: specialization allows multiple implementations, so long as one is more specific than the other. But that's not stable yet. Example. BTW feel free to bring this up on https://internals.rust-lang.org/ but I doubt there will be any rapid progress on it. You may also like N. Matsakis's blog, but it's a bit off-topic here. Edit: found something related |
So @pitdicker should I merge this PR while the |
Yes, I think merging this is a good idea. I am not really sure how to proceed with the I somewhere messed up this branch but will fix it in a moment. |
* Move the check if it is time to reseed out of the `try_reseed_if_necessary` and make sure that function does not get inlined. * Invert the counter direction. This way we can compare against 0 instead of `self.threshold` * Doing the reseed check after generating a value turns out to be a bit faster.`
ea3951c
to
07717bc
Compare
So for now 64-bit block RNGs would not use |
That is the idea 😄. And else it can still work and be faster with the Travis seems very busy today... |
Yes it is. I wonder if it's something to do with the new year? |
O wow, maybe... But ready after all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After another look, I think it would be worth changing the benchmarks. It will probably reduce the apparent impact of your improvement, but is more realistic and pertinent, especially when considering BlockRng
later.
src/reseeding.rs
Outdated
@@ -44,13 +43,14 @@ impl<R: Rng, Rsdr: Reseeder<R>> ReseedingRng<R, Rsdr> { | |||
/// # Arguments | |||
/// | |||
/// * `rng`: the random number generator to use. | |||
/// * `generation_threshold`: the number of bytes of entropy at which to reseed the RNG. | |||
/// * `threshold`: the amount of generated bytes after which to reseed the RNG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe "the number of [generated] bytes" is correct English; amount is typically used for "uncountable" things (e.g. water, money, food). But I'm not really fussed (I know I accepted something similar recently anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep correcting my English, It is not my first language and it is better if the language in the documentation is correct.
benches/generators.rs
Outdated
|
||
#[bench] | ||
fn reseeding_xorshift_bytes(b: &mut Bencher) { | ||
let mut rng = ReseedingRng::new(XorShiftRng::new().unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to benchmark the PRNG we're interested in (ISAAC or HC128)? Especially given that your BlockRng
idea integrates tighter with the PRNG algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With HC-128 the overhead of reseeding is much larger:
test gen_bytes_hc128 ... bench: 445,483 ns/iter (+/- 16,674) = 2298 MB/s
test gen_u32_hc128 ... bench: 2,809 ns/iter (+/- 194) = 1423 MB/s
test gen_u64_hc128 ... bench: 4,254 ns/iter (+/- 360) = 1880 MB/s
test init_hc128 ... bench: 4,539 ns/iter (+/- 412)
test reseeding_hc128_bytes ... bench: 451,584 ns/iter (+/- 25,463) = 2267 MB/s
test reseeding_hc128_u32 ... bench: 3,690 ns/iter (+/- 125) = 1084 MB/s
test reseeding_hc128_u64 ... bench: 5,907 ns/iter (+/- 157) = 1354 MB/s
And before this PR:
test reseeding_hc128_bytes ... bench: 449,635 ns/iter (+/- 5,755) = 2277 MB/s
test reseeding_hc128_u32 ... bench: 6,418 ns/iter (+/- 84) = 623 MB/s
test reseeding_hc128_u64 ... bench: 7,693 ns/iter (+/- 124) = 1039 MB/s
This makes sense, because the overhead of checking and indexing in the results array makes up 20~40%. With ReseedingRng
that percentage gets doubled because it also does checks for reseeding.
You are right, this benchmark is testing something nonsensical. Reseeding Xorshift ?!. I don't think it matters much because they both show the overhead of ReseedingRng
, but I'll change it.
I wanted to try inverting the counter in
ReseedingRng
as discussed in #59, but it turned out the performance was pretty bad to begin with.Benchmarks before:
After:
And plain Xorshift for comparison:
I don't like several parts of the current design of
ReseedingRng
, but that is for another issue.