Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(random/unstable): basic randomization functions #5626

Merged
merged 56 commits into from
Sep 5, 2024

Conversation

lionel-rowe
Copy link
Contributor

@lionel-rowe lionel-rowe commented Aug 3, 2024

Closes #4848

Currently includes the following:

  • @std/random/between: randomBetween function
  • @std/random/integer-between: randomIntegerBetween function
  • @std/random/sample: enhanced version of (to-be-deprecated?) @std/collections/sample's sample function, with new weights and random options
  • @std/random/seeded: randomSeeded function implementing the PGC32 algorithm
  • @std/random/shuffle: shuffle function implementing Fisher–Yates shuffle

@lionel-rowe lionel-rowe requested a review from kt3k as a code owner August 3, 2024 15:17
Copy link

codecov bot commented Aug 3, 2024

Codecov Report

Attention: Patch coverage is 98.24561% with 3 lines in your changes missing coverage. Please review.

Project coverage is 96.24%. Comparing base (7c0e917) to head (6bb99e4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
random/sample.ts 91.66% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #5626    +/-   ##
========================================
  Coverage   96.24%   96.24%            
========================================
  Files         485      491     +6     
  Lines       39269    39440   +171     
  Branches     5787     5811    +24     
========================================
+ Hits        37793    37958   +165     
- Misses       1432     1438     +6     
  Partials       44       44            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timreichen
Copy link
Contributor

timreichen commented Aug 3, 2024

  • @std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:

const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

Is there a particular reason why this algorithm is added?

I think this would be a cool addition to @std/collections.

@lionel-rowe
Copy link
Contributor Author

lionel-rowe commented Aug 4, 2024

@timreichen

Is there a particular reason why this algorithm is added?

Only that it was the algorithm suggested in the linked issue, it has a period high enough for any sensible use case, and it seems to give a good distribution of results. One other plus is that arbitrarily chosen positive integer seeds still seem to give good entropy, even if they're apparently "low entropy" numbers like 1 or 12 (i.e. they don't need to be primes or anything fancy). [Edit: Seeds that are adjacent to each other do seem to give very similar first-few results though.] I don't have the requisite expertise to verify its randomness properties beyond "seems to give good results" though.

Edit 2: Upon looking into it further I think something like PCG64 (numpy's default) would be a better choice.
Edit 3: PCG32, as the "recommended for most users" version of PCG.

@lionel-rowe
Copy link
Contributor Author

  • @std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:

const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

I quite like the idea of making it an option. It does require "unzipping" the values from their weights, but it can still be pretty versatile in terms of usage:

const weighted = new Map([["a", 5], ["b", 3], ["c", 2]]);
const result = sample([...weighted.keys()], { weights: [...weighted.values()] });

I'll keep the collections stuff within the scope of this PR for now as some of it relies on the random stuff (e.g. I'll use SeededPrng for tests), but happy to split it out before merging if that's preferable.

collections/shuffle.ts Outdated Show resolved Hide resolved
@kt3k
Copy link
Member

kt3k commented Aug 7, 2024

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

collections/sample.ts Outdated Show resolved Hide resolved
Comment on lines 33 to 40
Deno.test("randomBetween() throws if min or max are NaN", () => {
assertThrows(() => randomBetween(NaN, 1), AssertionError);
assertThrows(() => randomBetween(1, NaN), AssertionError);
});

Deno.test("randomBetween() throws if max is less than min", () => {
assertThrows(() => randomBetween(10, 1), AssertionError);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice checks 👍

@timreichen
Copy link
Contributor

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

@kt3k
Copy link
Member

kt3k commented Aug 7, 2024

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

I think we can deprecate collections/sample when we stabilize random/sample.

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

Or how about the name like SeededRandom? There is a similarly named npm package https://www.npmjs.com/package/seedrandom

random/_types.ts Outdated
* randomization.
* @default {Math.random}
*/
random: () => number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option is not very web api-ish. I think it would be better to remove it and add some kind of algorithm option if needed, that calls different functions internally. Doing customRandomFunction() * (max - min) + min seems very trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this option is not very web api-ish. I think it would be better to remove it and add some kind of algorithm option if needed, that calls different functions internally. Doing customRandomFunction() * (max - min) + min seems very trivial.

Callbacks seem pretty web API-ish to me, and they're more versatile than specifying a name of an algorithm — with a callback, it's easy to use with third-party randomization functions/sources. But perhaps more importantly, specifying an algorithm name would surely make any and all of the all implemented algorithms into non-tree-shakeable dependencies of any function taking such options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kt3k do we have a guide on how to name option functions? I thought thy have a suffix of Fn -> randomFn. Not sure though.

random/seeded.ts Outdated Show resolved Hide resolved
random/seeded.ts Outdated

// For convenience, allowing destructuring and direct usage in callbacks
// equivalently to Math.random()
this.random = this.random.bind(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think destructuring can be done without this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think destructuring can be done without this.

Destructuring loses this context of unbound methods (as does passing by reference in a callback).

random/seeded.ts Outdated
* assertEquals(prng.state, [13062938915293834817n, 10846994826184652623n]);
* ```
*/
get state(): Readonly<State> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does state need to be accessible outside the class? I only see it for testing purposes.

Copy link
Contributor Author

@lionel-rowe lionel-rowe Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think state should definitely be gettable, as it allows persisting a PRNG's current state for resumption later on. It's arguable whether or not it needs to be settable though, perhaps setting it should only be possible via the constructor or a from static method.

As with seeding, I'm still not too settled on how this should be exposed, maybe it should have explicit state and inc properties (and probably an algorithm property too, even though there's only one currently implemented).

Edit: I've kept it settable for now (in case it's desirable to have a global PRNG that can be mutated) and also added fromState static method to get a new instance from state. The SeededRandomState format is now { algorithm, state, inc }. I don't really love that it's state.state, open to suggestions on better naming (notable that the numpy+randomgen implementation is even worse in this respect as it has state.state.state 🫣)

Also notable is that the state and inc properties of SeededRandomState are both bigints, representing uint64s. It's a little unfortunate that this means they can't directly be JSON.stringifyed, as JSON.stringify rejects bigints. However, they can still be serialized/deserialized in some other way or stored in Deno KV, indexedDB, etc.

Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try get this landed soon.

random/integer_between.ts Outdated Show resolved Hide resolved
random/_types.ts Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't follow.

random/seeded.ts Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a lot nicer. Right now, we hardcode the PCG32 algorithm. Should this instead be configurable? In other words, what if we allowed for BYO PRNG algorithm? As to not delay this PR, we can explore this in a follow-up issue/PR.

Copy link
Contributor Author

@lionel-rowe lionel-rowe Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there should be plenty of room to add other options in future. Possibly moving back to a class-based model for the algorithm itself will be desirable (but keep randomSeeded as a HOF). Then roughly, algorithm classes compatible with randomSeeded must implement an interface providing static methods like fromSeed, instance methods like nextUint, and static properties like numSeedBytes and nextUintBitSize.

random/between.ts Show resolved Hide resolved
random/between.ts Show resolved Hide resolved
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing. Can you please go through all exceptions and ensure they following the contributing guidelines? See https://github.com/denoland/std/blob/main/archive/tar_stream.ts for an example.

random/between.ts Outdated Show resolved Hide resolved
@lionel-rowe
Copy link
Contributor Author

One last thing. Can you please go through all exceptions and ensure they following the contributing guidelines? See https://github.com/denoland/std/blob/main/archive/tar_stream.ts for an example.

@iuioiua I think they already do, with the exception that error messages beginning with a variable name don't capitalize the variable name:

throw new RangeError("max must be greater than or equal to min");

How should those be handled? `max`/`min` with backticks would be another option.

random/between.ts Outdated Show resolved Hide resolved
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random/between.ts Outdated Show resolved Hide resolved
random/between.ts Outdated Show resolved Hide resolved
random/integer_between.ts Outdated Show resolved Hide resolved
random/sample.ts Outdated Show resolved Hide resolved
random/sample.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excellent work. Thank you, and thank you to the contributors that chimmed in. Let's track any possible points of improvements in a tracking issue for discussions, etc.

@iuioiua iuioiua enabled auto-merge (squash) September 5, 2024 00:27
@iuioiua iuioiua disabled auto-merge September 5, 2024 00:27
@iuioiua iuioiua requested a review from kt3k September 5, 2024 00:27
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kt3k kt3k merged commit 149839b into denoland:main Sep 5, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: @std/random
6 participants