Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random for Int64 #41412

Closed
wants to merge 4 commits into from
Closed

Conversation

vorotynsky
Copy link

I created methods to generate random floats and longs.

Can you help me to find tests for System.Random?

Fixes #26741

Created method in System.Random for generating longs and floats.

Fix dotnet#26741
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@dnfadmin
Copy link

dnfadmin commented Aug 26, 2020

CLA assistant check
All CLA requirements met.

It should make the algorithm more stable.

Fix dotnet#26741
Comment on lines 228 to 231
unchecked
{
part = (short) InternalSample();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has something similar to the effect of a modulus operation, but I haven't done the math to tell whether it will evenly generate all values in the range (due to InternalSample() not returning `int.MaxValue). Possibly suspect.

Comment on lines 233 to 234
result |= (long) part;
result <<= 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspect. part can be negative (when InternalSample() returns a sufficiently large result), so when the cast happens this sets bits that were probably not intended (and doesn't set bits that probably were). Additionally, due to the fact that part can be negative, this means the overall result can be negative as well.

Note that if part was restricted to only be positive, bits 16, 32, or 48 would never be set, which would be a problem.

Copy link
Author

@vorotynsky vorotynsky Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case better?

int i = InternalSample();
long result = 0;

result = result | (long) InternalSample();
result = result | (1u << 31 & (i << 2));
result = result | (1u & i); // upd: (i >> 2)

result <<= 32;

result = result | (long) InternalSample();
result = result | (1u << 31 & (i << 3));
result = result | (1u & (i >> 3));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems overly complicated, this would read better:

long result = 0;
result |= ((long) InternalSample()) << 32;
result |= ((long) InternalSample()) << 1;
result |= ((long) InternalSample()) & 1L;

(Not tested, and I'm not sure of the statistical implications of this)

Point 2 here (the one about 0 not being returned from Next()) seem somewhat off, given that of course there aren't going to be many values in the range they indicate (there's more numbers available above that range than below it - it's not half the range, as the writer seems to expect, but 1/2^32 of it, or so).

You can't use (as they suggest there):

long result = NextDouble() * long.MaxValue;

... because doubles only have 2^53 unique values in the range returned, and there are some integral values that a double can't represent (some values above 2^53-ish), which means some of them will be getting skipped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result |= ((long) InternalSample() >> 1) & 1L;

It should be shifted due to unreachable int.MaxValue.

Copy link
Author

@vorotynsky vorotynsky Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do 1-30 bits of InternalSample() have equal probability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be shifted due to unreachable int.MaxValue

You're right. (I woke up this morning realizing this)

However, it turns out that won't be sufficient, because that leaves bits 32 and 1 with a distribution problem as well. It needs to be modified to:

long result = 0;
result |= ((long) InternalSample()) << 32;
// InternalSample() returns _numbers_ evenly distributed in the range [0, int.MaxValue),
// but this results in the 0th bit having a slight decrease in probability,
// because there is one fewer odd value than even values.
result |= ((long) InternalSample()) << 2;
result |= ((long) InternalSample() >> 1) & 3L;

Do 1-31 bits of InternalSample() have equal probability?

Assuming bits are indexed [0, 31]:

  • bit 0 has the aforementioned problem
  • bits 1-30 should have equal probability, or the numbers won't be evenly distributed (.... I think....?)
  • bit 31 will never be set

... this is, however, assuming that InternalSample() actually returns a value in the given range uniformly (that is, ignoring bugs). That I won't comment on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found, that number 0x0111_1111_1111_1111_1111_1111_1111_1110 accurs once in range [0; int.MaxValue) with deleted first and last bits. Other numbers do twice.

last mid first
0 111..111 0 value in range [0; int.MaxValue)
0 111..111 1 int.MaxValue
1 111..111 0 -2
1 111..111 1 -1

It means that InternalSample() is a bad choice to get random bits if distribution of InternalSample() is uniform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found, that number 0x0111_1111_1111_1111_1111_1111_1111_1110 accurs once in range [0; int.MaxValue) with deleted first and last bits. Other numbers do twice.

Argh, I think you're right. The result of this is that any individual bit has a slightly smaller chance of being set than it does not being set (that is, it's more likely for the bit to not be set).

This means our options are:

  1. Acknowledge this issue, and accept it.
  2. Still do this via bit-twiddling, but find a better source of randomness.
  3. Do this mathematically instead of via bit twiddling. I think that might be possible, but haven't worked out the operations that would be needed. I don't think you'd need to call InternalSample() any additional times.

Anybody else want to comment on the direction that should be taken here?


(This is another time I want to Do Not Spindle, Fold, orand Mutilate somebodyover API choices. Like, argh)

==============================================================================*/
public virtual long NextInt64()
{
return FullLong();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See problems with FullLong(), but the current implementation does not obey the documented contract.

return 0;

long fullLong = (long) FullLong();
return (long.MaxValue & fullLong) % maxValue;
Copy link
Contributor

@Clockwork-Muse Clockwork-Muse Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distorts results, for two reasons:

  1. The bitwise-and, possibly in an attempt to make the range positive, subtly distorts the number of results, due to there being one more negative number than there are positive (or two, if because long.MaxValue is never returned). If FullLong() returned the correct range, the bitwise-and should be unnecessary.
  2. The modulus operator distorts the results, because usually maxValue will not be an integral factor of the range of the random. See this for a discussion, and the potential fix if modulus will continue to be used.

(comment edited because I was wrong about number distribution)

throw new ArgumentOutOfRangeException(nameof(minValue), SR.Format(SR.Argument_MinMaxValue, nameof(minValue), nameof(maxValue)));
}

long range = maxValue - minValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent overflow. Compare with int NextInt(int, int).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a new commit.

Max and min are both positive or both negative. Thay cant create overflow.

Comment on lines 281 to 284
long range1 = range / 2;
long range2 = range - range1;

return NextInt64(range1 + 1) + NextInt64(range2) + minValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is attempting to dodge additional overflow issues, but I'm unsure whether this will properly distribute values over the entire range. Possibly suspect.

==============================================================================*/
public virtual float NextSingle()
{
return (float) Sample();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion from double to float is going to round unrepresentable double values to representable float ones - and it's not clear that the distribution of that is going to be even. Is there a reference that can be added as a comment as to whether this is safe?

@vcsjones
Copy link
Member

Can you help me to find tests for System.Random?

https://github.com/dotnet/runtime/blob/master/src/libraries/System.Runtime.Extensions/tests/System/Random.cs

@vcsjones
Copy link
Member

Re, the current build failures.

When you add a new public API, you also need to update the reference source. Details on how to do that are here. https://github.com/dotnet/runtime/blob/2a4284b9f3eb2ad95fdf324f0249d372abff96df/docs/coding-guidelines/updating-ref-source.md

If the ref source updated successfully, you should see the ref source modified locally. It should add your new public members somewhere here:

public partial class Random
{
public Random() { }
public Random(int Seed) { }
public virtual int Next() { throw null; }
public virtual int Next(int maxValue) { throw null; }
public virtual int Next(int minValue, int maxValue) { throw null; }
public virtual void NextBytes(byte[] buffer) { }
public virtual void NextBytes(System.Span<byte> buffer) { }
public virtual double NextDouble() { throw null; }
protected virtual double Sample() { throw null; }
}

Updating the reference source will probably change some other things that you didn't intend (new or removed attributes on unrelated changes) - that's okay, just undo those changes and keep the changes that changed the ref source for System.Random.

@vorotynsky
Copy link
Author

@Clockwork-Muse, @vcsjones, thanks!
I'll think about math problems.

Fix dotnet#26741

Signed-off-by: Vorotynsky Maxim <vorotynsky.maxim@gmail.com>
@stephentoub
Copy link
Member

@vorotynsky, thanks for your efforts here. Are you still working on this?

@vorotynsky
Copy link
Author

No, I had a problems in building tests and gave up. And I don't know how to fix it.

@vorotynsky vorotynsky closed this Oct 23, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support generating random 64-bit values.
6 participants