Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUIDs are not sorted properly in a MSSQL DB at scale #2

Closed
jeffward01 opened this issue Dec 28, 2022 · 11 comments
Closed

UUIDs are not sorted properly in a MSSQL DB at scale #2

jeffward01 opened this issue Dec 28, 2022 · 11 comments

Comments

@jeffward01
Copy link

Context is here Link to code that can be reproduced

Using UUID version 7:

var values = Enumerable.Range(1, 10)
.Select(i => {
Thread.Sleep(TimeSpan.FromMilliseconds(10)); // UUIDNext uses millisecond-precision, let's sleep for a bit
return $"({i}, '{Uuid.NewDatabaseFriendly():D}')";
})
.ToList();

Console.WriteLine($"{string.Join(", ", values)}");

Then check that ORDER BY sorts them as expected:

CREATE TABLE #testing_uuids (
ExpectedPosition int PRIMARY KEY,
UUID uniqueidentifier NOT NULL
);

INSERT INTO #testing_uuids
VALUES (1, '01855978-effc-758f-8fc1-6b755429799c'),
(2, '01855978-f00f-74c2-a6f0-8e8c5445ec9a'),
(3, '01855978-f019-7104-bab4-79f068bc8150');

SELECT *
FROM #testing_uuids
ORDER BY UUID ASC;

Unfortunately, on MSSQL I got 1, 3, 2.

@ByronAP
Copy link

ByronAP commented Dec 28, 2022

this is because sql server stores data using little-endian

@jeffward01
Copy link
Author

Thats very interesting, thanks for this! I did not realize that.

Will this result be problematic? Is there anything you suggest we do with to prevent this, or its not a big deal?

Thanks

@peschkaj
Copy link

It's even worse: https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450

Depending on your ORM, you can create an extension to rearrange the UUID bytes on the way in and out, but you'll likely want a CLR function to also make it easy to search via TSQL.

@ByronAP
Copy link

ByronAP commented Dec 28, 2022

It's problematic in the sense that this library doesn't account for the way sql server stores data. What you perceive and would assume would make an ordered index doesn't.

@peschkaj
Copy link

We have used base62 encoding, left padding, and CHAR(22) to work around the limitations of SQL Server. It's important to note that this is a SQL Server issue, not a UUID library issue.

@ByronAP
Copy link

ByronAP commented Dec 28, 2022

Agreed it's not the libraries' fault

@jeffward01
Copy link
Author

We have used base62 encoding, left padding, and CHAR(22) to work around the limitations of SQL Server. It's important to note that this is a SQL Server issue, not a UUID library issue.

How can we be aware to correct this in our own implementations? Such as, many of us are C# developers, not SQL experts, perhaps we know MongoDB, Postgres, MSSQL, and can write some queries, and scripts, but beyond that its a bit of a black box.

What can we do to 'fix' this? Or ensure we use this library property in conjunction with MSSQL?

@ByronAP @peschkaj

@ByronAP
Copy link

ByronAP commented Dec 29, 2022

I think the best way to do this is by using a bigint in the same or similar fashion as twittter snowflake.

Snowflake_ID Wikipedia

Snowflake Repository

Snowflake IdGen Repository

@peschkaj
Copy link

@jeffward01 your options are:

  • Don't use SQL Server (just kidding). You should file a bug, though, that UNIQUEIDENTIFIER doesn't work well for your use case and that we need a sortable UUID type in SQL Server.
  • Reorder the UUID bytes so they sort in SQL Server. You'll likely need functions both in the application as well as some SQL CLR functions to make it easy to query the database directly. (Someone has written this before, but I cannot find it today for some reason.)
  • Write a Base 62 encoder/decoder function and left pad all your encoded UUID strings appropriately. A CHAR(22) key will work well here and is only 6 bytes wider than a UUID. Various ORMs have ways to wire up type conversions for you.
  • Use a flake implementation. I wrote a flake implementation ages ago that you can find on GitHub https://github.com/peschkaj/rustflakes

@jeffward01
Copy link
Author

jeffward01 commented Jan 3, 2023

Thanks for this! you guys are great! I appreciate all the help and direction, I will check out https://github.com/peschkaj/rustflakes and if it does not suit my needs (my needs are super super basic, just a performant UUID that I can use on SqlServer (sadly a requirement) and other databases.

I really appreciate you guys making this less of a 'black box'.

When I get my config's up and going, is there anything I can do to help you guys? I plan on writing a post about this very nuanced area and posting it on reddit or something (I don't have a blog)

@peschkaj - I just read the readme:

When you want time-based ordered IDs generated in many locations and sent to many locations. When you can't generate sequential identifiers yourself (Windows Azure SQL Database, I'm looking at you).

hahaha, thats great

@mareek
Copy link
Owner

mareek commented Jan 9, 2023

Hello guys, Library author here.
So I did some investigation and testing to implement a SQL Server friendly UUID Generator. Fortunately, the new standard introduces UUID Version 8 which basically a "Bring your own algorithm" version that only require the presence of the variant an version bits.

So I've created a new UUID V8 generator named UuidV8SqlServerGenerator (which is basically UuidV7Generator with some byte shuffling) that produces unique UUID that are naturally sorted on SQL Server.
I've also added a parameter to the static method Uuid.NewDatabaseFriendly to specify the database in which the UUID will be stored so that the correct version is used depending on the sorting method of each database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants