-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid low bits #68
Avoid low bits #68
Conversation
Please don't look at it yet, I did things stupidly... |
src/distributions/range.rs
Outdated
// least significant bits. Because many RNG's of | ||
// lower quality are weak in their least significant | ||
// bits, we use division. | ||
return self.low.wrapping_add((v / range) as $ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you meant let n = (zone+1) / range; return v / n
(avoiding the overflow)?
That means an extra division somewhere in new_*
. Since n = (unsigned_max - range + 1) / range + 1
you may be able to compute both remainder and quotient in a single op with something like div
(I thought there was a Rust equivalent but can't find it). Even then there's an extra copy and addition IIUC so I expect performance will suffer.
So I'm not convinced this is worth it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I was sleeping while doing the math 🙁.
To avoid two divisions I was thinking along these lines in new_*
:
let range = (high as $u_large)
.wrapping_sub(low as $u_large)
.wrapping_add(1);
let divisor = unsigned_max / range;
let zone = range * divisor;
Two subtractions are traded for one multiply, and one modulus for a division. But have to think about it more careful yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asleep, I see :D
You need (unsigned_max + 1) / range
which of course overflows, so you need this:
let divisor = (unsigned_max - range + 1) / range + 1;
let zone = range * divisor; // may wrap to zero
Simultaneous division & remainder would probably still be faster (but more CPU dependent maybe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a Rust equivalent, on num::Integer
and as an unstable intrinsic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used them together before, LLVM is able to optimize the two operations into one when they are close together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -176,7 +178,7 @@ fn ziggurat<R: Rng+?Sized, P, Z>( | |||
} else { | |||
bits.closed_open01_fixed() | |||
}; | |||
let i = (bits & 0xff) as usize; | |||
let i = ((bits >> 3) & 0xff) as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were there three unused bits? I thought you used them all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 11 spare bits, and we only need 8
a8641a2
to
3f22ddd
Compare
I have replaced the modulus in Benchmarks (see the diff colums for the results with the time Xorshift takes removed): x86:
x86_64:
Like using a division, widening multiply depends more on the most significant bits than the least significant ones. I have changed the tests for |
Great work, but I'm still wondering: do we want this? Is it a question between using an RNG like Xorshift with these changes or using another RNG which doesn't appear to have these weaknesses in the low bits? You should be able to answer this question better than anyone based on the research you've been doing lately, I guess. |
Even when we pick an RNG that is good and does not need those changes, there would still be RNG's in other crates that would benefit from them. The change to the ziggurat layer makes it little bit slower, 3~4%. I wonder if some indexing tricks can recover that... So two of the three are definitely faster, and it helps with weaker RNG's. Seems win-win to me 😄 |
625f86a
to
68edc9f
Compare
The numbers you posted are ns/iter? Ah, ok. 😆 👍 |
Can you also benchmark constructing and sampling from a range? Single-use ranges are quite common, e.g. the code in the |
Results: Before:
After:
So there is some improvement, but they are both terrible... Maybe it is worth thinking about a single-use range? |
|
I think it is possible to get within 70% of the normal version. If we don't search for the optimal zone with a modulus but pick one with bitshifts that should be a win. On average 75% of the RNG's results should still be acceptable. And with everything in one function it can maybe optimise a bit better. |
Okay, shall I go ahead and merge? It looks like the CI failure is something unrelated I messed up (seeding-minimal branch I think). |
I can't figure out how to shuffle the traits to get something like Feel free to merge this though. |
I think you want pub fn sample_single<X: SampleRange, R: Rng+?Sized>(low: X, high: X, rng: &mut R) -> X {
X::T::sample_single(low, high, rng)
} |
Thank you, that could have taken me hours! |
No problem! It was a bit of a weird error and something you wonder why Rustc didn't figure out. |
This implements #52 (comment).
Many small RNG's are of lower quality in their least significant bits. For example LCG's with a power of two modulus, MCG's, Xorshift+ and Xoroshiro+.
If it costs us nothing to avoid the least significant bits it seems sensible to me to do so. Even if none of these RNG's becomes part of rand itself.