Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid v7 #681

Merged
merged 23 commits into from
Jun 3, 2024
Merged

uuid v7 #681

merged 23 commits into from
Jun 3, 2024

Conversation

pmccarren
Copy link
Contributor

@pmccarren pmccarren commented Jan 22, 2023

UUID V7

This PR implements UUID V7, closing ref #580

V7 is implemented, tested and documented and ready for review! No changes to the other versions were made.

Relevant RFC documents:

Benchmarks

V7 appears as fast as V4 when not using native crypto.randomUUID generation.

Results:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,450,855 ops/sec ±2.12% (88 runs sampled)
uuid.parse() x 2,350,677 ops/sec ±1.19% (88 runs sampled)
---

uuid.v1() x 3,585,680 ops/sec ±0.86% (92 runs sampled)
uuid.v1() fill existing array x 9,229,031 ops/sec ±0.50% (93 runs sampled)
uuid.v4() x 13,834,376 ops/sec ±3.34% (84 runs sampled)
uuid.v4() fill existing array x 3,833,619 ops/sec ±1.51% (84 runs sampled)
uuid.v4() without native generation x 2,395,572 ops/sec ±1.27% (91 runs sampled)
uuid.v3() x 268,797 ops/sec ±1.32% (86 runs sampled)
uuid.v5() x 287,899 ops/sec ±1.12% (92 runs sampled)
uuid.v7() x 2,764,344 ops/sec ±0.95% (93 runs sampled)
uuid.v7() fill existing array x 2,801,443 ops/sec ±5.48% (84 runs sampled)
uuid.v7() with defined time x 3,223,107 ops/sec ±1.14% (89 runs sampled)
Fastest is uuid.v4()

TODO:

Copy link
Member

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @pmccarren ccarren. This looks promising!

In addition to the inline comments, can you add a unit test alongside https://github.com/uuidjs/uuid/blob/main/test/unit/v1.test.js, with similar assertions. Specifically, I'd like to see similar tests for sort order, uniqueness, and options handling. (And anything else you think may be relevant.)

The sorting will be of particular interest, since producing UUIDS that sort the same whether sorted by timestamp or lexicographically was pretty much the whole point of version 7. 😄

Note: This should not close #580, as the new RFC adds v6 and v8 formats, too, as well as the MAX_UUID constant. v7 is definitely the majority of what's needed, but we'll want to keep 580 open until we fill in the remaining items.

Lastly, the new RFC is still making it's way through the review process. I don't expect substantive changes at this point, so this is an appropriate time to be adding this. But as I noted in 580, I think it's prudent to go with an experimental version release tag of some sort.

@ctavan: Thoughts on how best to accept this? Should we create a v9-experimental branch that we merge this into rather than master? Then publish experimental releases from that?

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
src/regex.js Outdated Show resolved Hide resolved
src/uuid-bin.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
@pmccarren
Copy link
Contributor Author

@broofa - glad to lend a hand! I appreciate your time reviewing the implementation.

This PR has been updated with your suggested changes, most notably:

  • reworked buffer management so rnds is not mutated
  • added v7 unit test coverage
  • fixed README generation (via npm run docs)

In regards to sorting, I've been considering a few different approaches and while not presently implemented in this PR, I absolutely agree it should be :) I'll work on implementing this as well the associated unit tests.

src/regex.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
Copy link
Member

@LinusU LinusU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good 👍

Lastly, the new RFC is still making it's way through the review process.

Thoughts on how best to accept this? Should we create a v9-experimental branch that we merge this into rather than master? Then publish experimental releases from that?

Personally think that we should keep this PR open until the spec has been ratified. If we merge into a v9 branch that would stop us from making a v9 release until that has happened, and that might potentially be a long time?

I think that it would be nice to backport the "version 2 is valid" bugfix right away to the current release though?

@ctavan
Copy link
Member

ctavan commented Jan 25, 2023

Thanks an lot for the contribution!

I initially thought this new feature could land in the v9 release, but given we’re broadening the validation regex it’s a breaking change and we’ll have to bump the major version.

I think it would be nice to somehow release it as a prerelease so that people have a chance to try it and provide feedback, but I think our current release and branching process is not yet ready for multiple active major releases. If anyone’s willing to take a look at this I’m happy to support (we were considering better automation for this for a while… #636).

I agree however, that we should not release this in a public major release until the draft has been accepted.

src/regex.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
@broofa broofa changed the base branch from main to rfc4122bis January 25, 2023 23:46
@broofa
Copy link
Member

broofa commented Jan 25, 2023

Note: I've just created the rfc4122bis branch off main as a place to land this (and set it as the target for this PR). As @ctavan says, we haven't worked through the deploy & publish pipeline for non-main branches yet, so we'll have to figure that out.

Wish I could give you an ETA for that, but it's not been a priority for either of us. As he says... "Help Wanted".

Co-authored-by: Linus Unnebäck <linus@folkdatorn.se>
@broofa broofa added the bis label Feb 4, 2023
Copy link
Member

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan @LinusU Any other comments? If not, I'd like to get this merged (pending @pmccarren commiting his suggested change for the regex to allow 1-8).

Regarding the plan for the rfc4122bis branch, I've created a project to capture the remaining work, with brief notes on possible implementation details.

Honestly, there's not all that much left... if anyone is feeling inspired. ;-)

Lastly, thoughts on inviting Patrick to join the UUIDJS org? He's done some solid work here that's much appreciated, and seappears to have good street cred. If people could weigh in on that idea (including you, Patrick), we can move forward or not as appropriate. [cc'ing @TrySound here]

@pmccarren
Copy link
Contributor Author

@broofa just merged the 1-8 regex update. I'm close to done with implementing monotonic generation - I'd prefer to merge once that's wrapped. Will be buttoned up in a day or two.

As for UUIDJS Org, I'll let my work speak for itself but will add that if you'll have me I'll be an active participant :)

@pmccarren
Copy link
Contributor Author

@broofa I just finished adding monotonic support, and it's ready for review! Took a bit of finesse but I'm rather happy with how it turned out. In addition to the monotonicity related unit tests, I manually tested sorting preservation during generation of 100M uuids.

couple of sequence counter implementation notes:

  • when additional generations occur inside the same millisecond, we use a dedicated 31 bits as an incrementing sequence counter. Referred to as Fixed-Length Dedicated Counter Bits (Method 1) in the draft RFC.
  • seq is 31 bits and (re)initialized from the random data pool whenever the clock advances/changes. as the seq is initialized randomly, for nominal usage there is a significant amount (74 bits) of randomness in a given v7 uuid.
  • when the sequence counter rolls over we increment the internal date by 1 millisecond and continue. this accepts the tradeoff of minor clock drift for lexicographical sorting in substantial batch id generation workloads.
  • if the internal date ever exceeds 10 seconds beyond system time, both the date and seq are hard reset.
  • lexicographical sorting is preserved up to (2^31)*10000 generations for a provided millisecond.
  • the seq is stored as 31 bits. the 12 high bits and 19 low bits are stored separately in the uuid to preserve sorting while maintaining a large enough counter size

Pending review, I believe this PR is ready to go now!

@pmccarren pmccarren requested a review from broofa February 9, 2023 23:19
@claytongulick
Copy link

Eagerly awaiting this to be released as well.

Thanks for the great library and work on it!

@DevBrent
Copy link

DevBrent commented Oct 2, 2023

I'll play ChatGPT here and summarize the current status:

  1. UUID org has concerns of the sequential "+ 1" nature impacting randomness of UUIDs generated within the same millisecond. My take on this is that this is a legitimate concern because you can theoretically spam an endpoint to get "seed" UUIDs then offset by +1, +2 to dig up other generations on the same millisecond. Despite it being a legitimate concern, many other libraries implement similar logic including https://github.com/mongodb/js-bson in my experience. uuid v7 #681 (comment) suggests a possible opt-out to enable truly random UUIDs within a given millisecond. I would personally make use of that option despite the lack of guarantee about sequential sort within the same millisecond.
  2. @pmccarren has implemented logic to push a UUID into the next millisecond up to an arbitrary 10 seconds before resetting the date (and I imagine the random seed) in order to guarantee the ability to generate (2^31)*10000 of sequential UUIDs in a given millisecond. This was one of the more significant things holding back this merge I gather because it seems there should be a better way to handle this.
  3. @robinpokorny mentioned a potential failure case when date inputs are provided from databases or for instance date pickers without milliseconds included which would be fairly common for such a widely used library like uuid. I'm not sure what a good solution for this would be.

My take would be UUIDs become less useful if you're going to alter the date input and don't strictly stick to modifying the sequence when sequencing. The primary driving factor of this is that a randomly initiated sequence could initialize 1 count below the max sequence and leaving padding atop the sequence reduces the entropy of the sequence.

This Hacker News thread has more discussion on the ULID spec which UUIDv7 was largely based on: https://news.ycombinator.com/item?id=36447837

The spec as written on that page is confusing on that point, but the incrementing-counter-within-the-same-millisecond-behavior only happens if you explicitly specify a "monotonicFactory", https://github.com/ulid/javascript#monotonic-ulids. The default behavior (just using the ulid() function) doesn't do that, it generates a completely random value regardless of the millisecond value.

If it's random value, then generated UUIDs won't always be ascending which defeats purpose as well.

No, that's not really it either. ULIDs have both a time component (in this case accurate to the millisecond) and a random component. Thus the order is always accurate up to the ms, and for the vast majority of applications anything created in the same millisecond can be considered created "at the same time", so it's usually OK if order is undetermined within the same MS.
Note that the UUID v7 spec is largely modeled after the ULID spec. ULIDs came first, and they traded the "standard" UUID format of 8x-4x-4x-4x-12x hex string for the more compact Crockford base32 format, and there are some other minor differences in number of timestamp bits vs. number of random bits, but they are otherwise functionally equivalent.

This discussion seems to imply other implementations ALWAYS re-seed the random bits and increment the counter.

I'm not really sure if I would prioritize sequential over randomness and perhaps this does need to be a preference. Generally, those seeking UUIDv7 for randomness security through obscurity/incalculability should be using another UUID.

Edit 1: Multi-node behavior should be considered as well as single-node sequence generator node behavior because I believe UUIDv7 in this library could be used in both forms. Each has slightly different preferences. For multi-node, all of this discussion goes out the window and it's best to not modify the msecs at all. For single-node, I think someone who prioritizes sort order over all else may want this unusual behavior.

Edit 2: Within each node of a multi-node cluster, I may wish to prefer sequential sort and it might be acceptable to eat into the MS up to 1-3 milliseconds for my usecases but I still feel uncomfortable. I'd rather OPT IN to sacrifice 10% of my sequence entropy than have this behavior though.

Edit 3: Percona notes UUID_SHORT from MySQL allows the sequence to rollover.

README.md Outdated Show resolved Hide resolved
README_js.md Outdated Show resolved Hide resolved
@mbrimmer83
Copy link

Rumor has it uuid v7 is now part of the proposed standard What would it take to get this over the finish line?

@ryan-keswick
Copy link

Any updates on this?

@broofa broofa changed the base branch from rfc4122bis to main June 3, 2024 00:10
@broofa broofa removed the bis label Jun 3, 2024
@broofa broofa mentioned this pull request Jun 3, 2024
9 tasks
@broofa
Copy link
Member

broofa commented Jun 3, 2024

Did another review pass. Pushed a couple minor documentation tweaks to update things now that RFC9562 is official.

@ctavan: In the interests of moving forward with 9562, I'm merging this. If there are questions or concerns, we can address those in followup PRs.

@pmccarren: Thanks for the great work on this. My apologies for letting this stagnate for so long. Now that 9562 is official, my goal is to get this pushed out ASAP.

@broofa broofa merged commit db76a12 into uuidjs:main Jun 3, 2024
5 checks passed
@ctavan
Copy link
Member

ctavan commented Jun 3, 2024

I‘m all for it, thanks for taking care!

But let’s make use of the conventional commit message format in the squash commits so we get proper autogenerated changelogs.

@broofa
Copy link
Member

broofa commented Jun 3, 2024

let’s make use of the conventional commit message format in the squash commits

Ugh... I keep screwing that up. :-(

@ctavan
Copy link
Member

ctavan commented Jun 3, 2024

Does GitHub have an option to run actions (commitlint) before squash merges?

@broofa
Copy link
Member

broofa commented Jun 3, 2024

There's probably a way, but... 🤷

@pmccarren
Copy link
Contributor Author

@broofa and @ctavan I appreciate your time in reviewing and assisting with the PR. I'm excited we're collectively one step closer to uuid7 being widely available :)

If there is a need for assistance, I have time I can dedicate towards experimental deploys. Just let me know if it'll be helpful, and I'm happy to dive in.

@broofa
Copy link
Member

broofa commented Jun 3, 2024

@pmccarren That'd be awesome. On the deploy front, there's two nuts we'd like to crack:

  1. Enforce Conventional Commits™ on every commit that goes onto the main branch. (witness my fumbling the message on this PR)
  2. Automate the deploy process so we're not manually publishing from our laptops.

We haven't tackled the former yet, but if you know a way to do that, we're all for it.

For the latter, I put up #725 a couple years ago before getting distracted by other things. The main goal there was to get away from our current ad-hoc release process that has us creating PRs just to update package.json#version, and running standard-version and npm publish manually on our laptops. Specifically, I wanted to automate the following:

  • Update CHANGELOG with notes for all relevant changes
  • Bump package.json#version based on Conventional Commit message semantics
  • Create git tag for release
  • Run npm publish in a workflow

release-please is (I think) still a good choice for that but I haven't followed development of that project so I'm not sure what bells-and-whistles may have been added. If you can get that working, that'd be great. Or if there's another option you think would work better I'm open to suggestions.

I've sent you an invite to the uuidjs org here. Let me know if there are permissions you need here or on NPM for this work. (Obviously we'll need to be careful about what gets npm-published.)

And thank you for stepping up to help (again).

@sachinjnd
Copy link

@broofa / @pmccarren Now that the changes for uuid v7 are merged, what's the plan for release on npm?

@ctavan
Copy link
Member

ctavan commented Jun 8, 2024

See #760

gisbdzhch pushed a commit to gisktzh/gb3-web_ui that referenced this pull request Jun 19, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [uuid](https://github.com/uuidjs/uuid) | dependencies | major | [`^9.0.0` -> `^10.0.0`](https://renovatebot.com/diffs/npm/uuid/9.0.1/10.0.0) |

---

### Release Notes

<details>
<summary>uuidjs/uuid (uuid)</summary>

### [`v10.0.0`](https://github.com/uuidjs/uuid/blob/HEAD/CHANGELOG.md#1000-2024-06-07)

[Compare Source](uuidjs/uuid@v9.0.1...v10.0.0)

##### ⚠ BREAKING CHANGES

-   update node support (drop node@12, node@14, add node@20) ([#&#8203;750](uuidjs/uuid#750))

##### Features

-   support support rfc9562 MAX uuid (new in RFC9562) ([#&#8203;714](uuidjs/uuid#714)) ([0385cd3](uuidjs/uuid@0385cd3))
-   support rfc9562 v6 uuids ([#&#8203;754](uuidjs/uuid#754)) ([c4ed13e](uuidjs/uuid@c4ed13e))
-   support rfc9562 v7 uuids ([#&#8203;681](uuidjs/uuid#681)) ([db76a12](uuidjs/uuid@db76a12))
-   update node support matrix (only support node 16-20) ([#&#8203;750](uuidjs/uuid#750)) ([883b163](uuidjs/uuid@883b163))
-   support rfc9562 v8 uuids ([#&#8203;759](uuidjs/uuid#759)) ([35a5342](uuidjs/uuid@35a5342))

##### Bug Fixes

-   revert "perf: remove superfluous call to toLowerCase ([#&#8203;677](uuidjs/uuid#677))" ([#&#8203;738](uuidjs/uuid#738)) ([e267b90](uuidjs/uuid@e267b90))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or rename PR to start with "rebase!".

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.