-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random for Int64 #41412
Random for Int64 #41412
Conversation
Created method in System.Random for generating longs and floats. Fix dotnet#26741
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
It should make the algorithm more stable. Fix dotnet#26741
3026e5e
to
5f0f238
Compare
unchecked | ||
{ | ||
part = (short) InternalSample(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has something similar to the effect of a modulus operation, but I haven't done the math to tell whether it will evenly generate all values in the range (due to InternalSample()
not returning `int.MaxValue). Possibly suspect.
result |= (long) part; | ||
result <<= 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspect. part
can be negative (when InternalSample()
returns a sufficiently large result), so when the cast happens this sets bits that were probably not intended (and doesn't set bits that probably were). Additionally, due to the fact that part
can be negative, this means the overall result can be negative as well.
Note that if part
was restricted to only be positive, bits 16, 32, or 48 would never be set, which would be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case better?
int i = InternalSample();
long result = 0;
result = result | (long) InternalSample();
result = result | (1u << 31 & (i << 2));
result = result | (1u & i); // upd: (i >> 2)
result <<= 32;
result = result | (long) InternalSample();
result = result | (1u << 31 & (i << 3));
result = result | (1u & (i >> 3));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems overly complicated, this would read better:
long result = 0;
result |= ((long) InternalSample()) << 32;
result |= ((long) InternalSample()) << 1;
result |= ((long) InternalSample()) & 1L;
(Not tested, and I'm not sure of the statistical implications of this)
Point 2 here (the one about 0
not being returned from Next()
) seem somewhat off, given that of course there aren't going to be many values in the range they indicate (there's more numbers available above that range than below it - it's not half the range, as the writer seems to expect, but 1/2^32 of it, or so).
You can't use (as they suggest there):
long result = NextDouble() * long.MaxValue;
... because doubles only have 2^53 unique values in the range returned, and there are some integral values that a double can't represent (some values above 2^53-ish), which means some of them will be getting skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result |= ((long) InternalSample() >> 1) & 1L;
It should be shifted due to unreachable int.MaxValue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do 1-30 bits of InternalSample()
have equal probability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be shifted due to unreachable
int.MaxValue
You're right. (I woke up this morning realizing this)
However, it turns out that won't be sufficient, because that leaves bits 32 and 1 with a distribution problem as well. It needs to be modified to:
long result = 0;
result |= ((long) InternalSample()) << 32;
// InternalSample() returns _numbers_ evenly distributed in the range [0, int.MaxValue),
// but this results in the 0th bit having a slight decrease in probability,
// because there is one fewer odd value than even values.
result |= ((long) InternalSample()) << 2;
result |= ((long) InternalSample() >> 1) & 3L;
Do 1-31 bits of InternalSample() have equal probability?
Assuming bits are indexed [0, 31]:
- bit 0 has the aforementioned problem
- bits 1-30 should have equal probability, or the numbers won't be evenly distributed (.... I think....?)
- bit 31 will never be set
... this is, however, assuming that InternalSample()
actually returns a value in the given range uniformly (that is, ignoring bugs). That I won't comment on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found, that number 0x0111_1111_1111_1111_1111_1111_1111_1110
accurs once in range [0; int.MaxValue) with deleted first and last bits. Other numbers do twice.
last | mid | first | |
---|---|---|---|
0 | 111..111 | 0 | value in range [0; int.MaxValue) |
0 | 111..111 | 1 | int.MaxValue |
1 | 111..111 | 0 | -2 |
1 | 111..111 | 1 | -1 |
It means that InternalSample()
is a bad choice to get random bits if distribution of InternalSample()
is uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found, that number
0x0111_1111_1111_1111_1111_1111_1111_1110
accurs once in range [0; int.MaxValue) with deleted first and last bits. Other numbers do twice.
Argh, I think you're right. The result of this is that any individual bit has a slightly smaller chance of being set than it does not being set (that is, it's more likely for the bit to not be set).
This means our options are:
- Acknowledge this issue, and accept it.
- Still do this via bit-twiddling, but find a better source of randomness.
- Do this mathematically instead of via bit twiddling. I think that might be possible, but haven't worked out the operations that would be needed. I don't think you'd need to call
InternalSample()
any additional times.
Anybody else want to comment on the direction that should be taken here?
(This is another time I want to Do Not Spindle, Fold, orand Mutilate somebodyover API choices. Like, argh)
==============================================================================*/ | ||
public virtual long NextInt64() | ||
{ | ||
return FullLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See problems with FullLong()
, but the current implementation does not obey the documented contract.
return 0; | ||
|
||
long fullLong = (long) FullLong(); | ||
return (long.MaxValue & fullLong) % maxValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This distorts results, for two reasons:
- The bitwise-and, possibly in an attempt to make the range positive, subtly distorts the number of results, due to there being one more negative number than there are positive (
or two, ifbecauselong.MaxValue
is never returned). IfFullLong()
returned the correct range, the bitwise-and should be unnecessary. - The modulus operator distorts the results, because usually
maxValue
will not be an integral factor of the range of the random. See this for a discussion, and the potential fix if modulus will continue to be used.
(comment edited because I was wrong about number distribution)
throw new ArgumentOutOfRangeException(nameof(minValue), SR.Format(SR.Argument_MinMaxValue, nameof(minValue), nameof(maxValue))); | ||
} | ||
|
||
long range = maxValue - minValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent overflow. Compare with int NextInt(int, int)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in a new commit.
Max and min are both positive or both negative. Thay cant create overflow.
long range1 = range / 2; | ||
long range2 = range - range1; | ||
|
||
return NextInt64(range1 + 1) + NextInt64(range2) + minValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is attempting to dodge additional overflow issues, but I'm unsure whether this will properly distribute values over the entire range. Possibly suspect.
==============================================================================*/ | ||
public virtual float NextSingle() | ||
{ | ||
return (float) Sample(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion from double
to float
is going to round unrepresentable double
values to representable float
ones - and it's not clear that the distribution of that is going to be even. Is there a reference that can be added as a comment as to whether this is safe?
|
Re, the current build failures. When you add a new public API, you also need to update the reference source. Details on how to do that are here. https://github.com/dotnet/runtime/blob/2a4284b9f3eb2ad95fdf324f0249d372abff96df/docs/coding-guidelines/updating-ref-source.md If the ref source updated successfully, you should see the ref source modified locally. It should add your new public members somewhere here: runtime/src/libraries/System.Runtime/ref/System.Runtime.cs Lines 3126 to 3137 in afa4b2f
Updating the reference source will probably change some other things that you didn't intend (new or removed attributes on unrelated changes) - that's okay, just undo those changes and keep the changes that changed the ref source for System.Random. |
@Clockwork-Muse, @vcsjones, thanks! |
d1ee749
to
1bf5b92
Compare
Fix dotnet#26741 Signed-off-by: Vorotynsky Maxim <vorotynsky.maxim@gmail.com>
1bf5b92
to
3088738
Compare
dae4918
to
0441e69
Compare
@vorotynsky, thanks for your efforts here. Are you still working on this? |
No, I had a problems in building tests and gave up. And I don't know how to fix it. |
I created methods to generate random floats and longs.
Can you help me to find tests for
System.Random
?Fixes #26741