Add API to correctly insert GuidV7 to SQLServer (BigEndian), so no index fragmentation occurs #2999
Replies: 36 comments 12 replies
-
see also: dotnet/runtime#103658 (comment)
|
Beta Was this translation helpful? Give feedback.
-
https://feedback.azure.com/d365community/idea/69d13f8a-b9fb-ed11-a81c-000d3adb7ffd |
Beta Was this translation helpful? Give feedback.
-
#2946 is a prereq for this |
Beta Was this translation helpful? Give feedback.
-
It's also worth referencing dotnet/efcore#33579, which covers a similar problem within EF Core. |
Beta Was this translation helpful? Give feedback.
-
I thought maybe add a Parameter to ConnectionString, like "SerializeGuidsBigEndian:true" could work. So also EFCore or any other ORM could benefit. |
Beta Was this translation helpful? Give feedback.
-
@jogibear9988 Should it not be possible to simply read the new Version property? If so, a much better UX I think. |
Beta Was this translation helpful? Give feedback.
-
??? Wich Version? |
Beta Was this translation helpful? Give feedback.
-
If the setting to change how guids are handled is in the connection string then you can't migrate a database from using one type to another without losing the ability to use guids that already exist and that doesn't seem like an acceptable change to me. Can you provide some simple tests for 1) "normal" guids and 2) v7 guids that show how they should be written and read back? I suspect that we need to have dedicated read and write methods because the type does not contain information about whether the bytes should be interpreted in a specific way or not. Consider how you would want to have both old and new guids used as parameter values. How do you specify that the contents are one or the other? If we need to extend only Sql* types like SqlDataReader with a method like GetGuidV7 then that's ok but if we need to push that up to DbDataReader then that requires a framework change and I don't know what the opinion of other providers (postgres, mysql, sqlite) would be about that change. /cc @roji |
Beta Was this translation helpful? Give feedback.
-
Second thought about my suggestion, it will not work. It will, but then the guids will have another value when written with dotnet as when you read them without the connection parameter. I think the only correct solution would be a new Guid Type in Sqlserver, that serializes the same as all other databases do. |
Beta Was this translation helpful? Give feedback.
-
Now that I've got a bit of time to respond properly! I think you're right; a blanket change to the way GUIDs are written to the network wouldn't work. It'd mean that every primary key value would be interpreted differently, and potentially that a developer could run a query from SQL Server Management Studio/Azure Data Studio and see a GUID with one value, but run the same query via M.D.S and see another value. The real problem isn't the way that a TDS library transports GUIDs to the server, it's the way that a client generates GUIDs in the first place. The EF Core issue I linked is designed to address that for one library. A better approach may be to create a utility method similar to |
Beta Was this translation helpful? Give feedback.
-
Guids are just data. SqlClient doesn't generate them at all. We're currently in a situation where anything that is identified as a guid gets handled in a specific way and that works. What we have to add is some way to have a second handler without having a second type and that means the type information must be maintained out of band by the user. The SqlParameter input case is mentioned is indicative of this. If you just put a guid into an object typed variable then there isn't enough information. The same problem happens in the other direction. If you have some rowdata which is typed as uniqueidentifier by sql server there is no way to know if that's old or new format. So whatever happens I don't see any other way to proceed than having special methods to both read and write the new format. We'll also need a wrapper type for object typed location like SqlParameter. |
Beta Was this translation helpful? Give feedback.
-
Depending on what type of data is used in the database column for storage, indexes may or may not be fragmented (uniqueidentifier / binary(16)) |
Beta Was this translation helpful? Give feedback.
-
I'm talking about a client-side operation, not M.D.S. Sorry for the confusion here.
The methods you're describing make complete sense, and if that's definitely the problem to solve then I agree - it's the right way to solve it. I just don't think that SqlClient's conversion of Guids is the core problem. As I read it, the core problem seems to be that if a .NET application generates and inserts many GUIDs in parallel into a table with an index covering a uniqueidentifier column, that index will be fragmented. We'd be able to patch it within SqlClient with the method you've described, but I think the core problem is that the GUIDs being generated aren't currently "SQL Server-friendly." I've suggested implementing |
Beta Was this translation helpful? Give feedback.
-
SQL Server is using However, when it later loads it as What this means is that the issue isn't in how the data is serialized nor is the issue in how the data is compared. The root cause of the problem here is that serialization and comparison do not agree with eachother on the data format. If you changed the serializer to write as This behavior cannot be "fixed" for back-compat reasons, but some new APIs could be exposed that allow users to get the right thing to happen. Similarly, users can rely on the current behavior and workaround it by swapping the endianness of their produced |
Beta Was this translation helpful? Give feedback.
-
But a new ColumnType like Guid2 could fix it, or am I wrong? Also a new Darabase flag, that could change default guid comparison? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation tannergooding. I agree that the right way to sort UUIDs is as 128-bit uints, and I think it brings us full circle to the core problem - SQL Server sorts them differently. The root cause of the fragmentation is that v7 UUIDs are only effectively sorted in order of
We could change the way we read and write a GUID on the transport. There are a few different options for that - if we wanted to be explicit, we could use a Uuidv7 SqlDbType, a SqlDataReader.GetUuidv7 method, etc. If we wanted to make it "just work" then we could detect a GUID with version 7 and handle it differently. Either of these approaches would let us send and receive UUIDv7 GUIDs. One risk which I can see with that approach is how clients would switch from using normal Guids to UUIDv7 GUIDs when other people have a record of the ID to look up, such as the below: Guid id = Guid.Parse(userInput);
SqlCommand cmd = CreateCommandWithUuidv7Parameter(sql: "SELECT TOP 1 Name FROM MyTable WHERE Id = @Id", id);
using SqlDataReader reader = cmd.ExecuteReader();
Assert.IsTrue(reader.Read()); In the simple case (table contains only reshuffled UUIDv7 uniqueidentifiers) this'll be fine. In a real-world case, where the table contains a mixture of UUIDv7 and non-UUIDv7 uniqueidentifiers the results will have changed. While there are workarounds, it feels to me like we'd be opening a pit of failure. Placing the responsibility on the transport layer also opens a slightly weird quirk: a test similar to the below would fail, because the Guid would have its bytes rearranged on the transport but the varchar would not. Guid id = Guid.CreateVersion7();
SqlCommand cmd = CreateCommandWithUuidv7Parameter(sql: "SELECT @Id, CAST(@Id AS varchar(max))", id);
using SqlDataReader reader = cmd.ExecuteReader();
reader.Read();
Assert.AreEqual(id, reader.GetUuidv7(0)); // True
Assert.AreEqual(id.ToString(), reader.GetString(1)); // False My opinion's thus slightly in favour of creating a utility method somewhere - whether we create a SqlGuidHelper.NewSequentialId method in SqlClient, or request a SqlGuid.NewSequentialId method in runtime - which generates a SqlGuid which SQL Server would consider sequential, based on transforming Guid.CreateVersion7 as vanbukin suggested. This leaves the transport of GUIDs untouched, and means that switching from using Guid.NewGuid to the new method becomes trivial. |
Beta Was this translation helpful? Give feedback.
-
We use on our side different Databases, and also Guids could be generated in the Frontend, so this would not help. |
Beta Was this translation helpful? Give feedback.
-
https://devblogs.microsoft.com/oldnewthing/20190426-00/?p=102450 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Ok. So we've been around all the arguments back and forth. Are we back to the point where we can discuss the api instead of taking cheap shots at sql server? Methods can be provided which will accept a little endian guid and convert it to big endian. These would be SqlClient specific. You could use them directly or EFCore could use them on your behalf and provide some sort of attribute [SqlServerBigEndianGuid] for decorating cto properties. GetGuidBE, GetSqlGuidBE? We'll need a wrapper of some sort for SqlParameter, do we just force users to manually create a big endian SqlGuid? |
Beta Was this translation helpful? Give feedback.
-
Thanks @Wraith2. I'll be able to write a concise reply at some point. I'm (tentatively) in favour of something like public static class GuidExtensions
{
// Generates a UUIDv7 and reshuffles its bytes to align with SQL Server's sort algorithm.
public static Guid NewSequentialId() { }
public static SqlGuid NewSqlSequentialId() { }
// Wraps user input, performing the same byte reshuffling. Throws if version7Uuid.Version is not 7.
public static Guid WrapVersion7(Guid version7Uuid) { }
public static SqlGuid WrapSqlVersion7(SqlGuid version7Uuid) { }
} Once that Guid has been created by the helper class, I don't think it actually needs any transformation layers on SqlDataReader or SqlParameter - the return value would just be another Guid, all transformation taken care of. EF Core could then update its SequentialGuidValueGenerator to take advantage of it and close efcore#33579. |
Beta Was this translation helpful? Give feedback.
-
People can do that already. The problem is that for guids that aren't created in that way they'll capable of being fragmented. You shouldn't need to use the database (or driver) as a source guids in order to make sure it works. We should be able to take a guid create by an application or taken from a postgres database and be able to insert it into the database in a specific way. Pushing it up the stack to creation time leaks implementation details around sql server that people shouldn't need to care about. |
Beta Was this translation helpful? Give feedback.
-
I think this is focusing too much on the wrong details as well, notably. Any new API should do the "right thing" with any
That is, the needed fix is functionally similar to providing a custom |
Beta Was this translation helpful? Give feedback.
-
My initial Idea when I opend the Ticket here was that the SQLClient could change the bytes when writeing to SQL server. I didn't think about that then the Guid is wrong when I for example would use it with another connection library to SQLServer or directly in SQL. So I think now, the only Solution is a fix directly in SQLServer, for example a Database Flae to change default Guid Handling, so it is stored/compared in another way |
Beta Was this translation helpful? Give feedback.
-
I'm slightly in favour of leaving the transport layer alone because of a scenario like the below. Guid id = Guid.Parse(userInput);
using var connection = new SqlConnection(connStr);
connection.Open();
using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id = @Id", command);
command.Parameters.Add("@Id", SqlDbType.UniqueIdentifier).Value = id;
var rowCount = (int)command.ExecuteScalar();
Console.WriteLine($"Row count: {rowCount}"); The contents of
All told, it's a pretty trivial example: we accept user input, then print the number of records with a matching ID. Most of the IDs here are not V7 UUIDs, so its index fragmentation will eventually grow. If we now implement some kind of byte-shuffling within SqlClient, our sample code behaviour has changed. Byte-shuffle all GUIDsNow, if the user inputs a GUID of Byte-shuffle all GUIDs if the parameter has a new SqlDbTypeIf the user inputs a GUID of This is an improvement. However, now insert a new UUIDv7 value into If the user provides the value they've been given to the code snippet, the row count will be zero. SqlClient returns incorrect results. Same as above, with changes to the code snippetPerhaps we try to fix this by using a code snippet as so: Guid id = Guid.Parse(userInput);
using var connection = new SqlConnection(connStr);
connection.Open();
using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id = @Id", command);
// This has changed: the SqlDbType is now the same as the SqlDbType passed to the INSERT statement.
command.Parameters.Add("@Id", SqlDbType.Uuidv7).Value = id;
var rowCount = (int)command.ExecuteScalar();
Console.WriteLine($"Row count: {rowCount}"); The user provides the UUIDv7 value given in the previous scenario to the code snippet, the row count is one. Bug appears to be fixed. I don't see these situations as corner cases: it's the result of using an existing database which has a combination of Uuidv7 and non-Uuidv7 GUIDs in a single table. Working scenarioFor this to work, we'd need to change the snippet as so: Guid id = Guid.Parse(userInput);
using var connection = new SqlConnection(connStr);
connection.Open();
using var command = new SqlCommand("SELECT COUNT(1) FROM MyTable WHERE Id IN (@Id_1, @Id_2)", command);
// Note that there are now two parameters: one for the old format, one for the new format.
command.Parameters.Add("@Id_1", SqlDbType.UniqueIdentifier).Value = id;
command.Parameters.Add("@Id_2", SqlDbType.Uuidv7).Value = id;
var rowCount = (int)command.ExecuteScalar();
Console.WriteLine($"Row count: {rowCount}"); This works, but it's not obvious - and I'd personally say that it's overkill when all the user wants is to insert a GUID which doesn't fragment the index covering its column. Separately, while this is unlikely, we no longer have the guarantee of "at-most-one-record" we'd get from a direct equality check to a primary key column. Other variationsWe might want to turn this problem into an edge case by deciding that we'll only byte-shuffle UUIDv7 Guids with a new SqlDbType. This solve the problem for people who only use the runtime, but it still means that it'll affect somebody who already generates their own UUIDv7 Guids. Another way to handle this might be a separate Uuid type, and only byte-shuffle these. I've got no technical objection to this, although it feels a little consistent for the runtime to store a Uuidv7 in a standard Guid but for SqlClient to have a dedicated type. If I'm missing something obvious and this type of case isn't a problem then I've not got any technical objection to writing Guids differently in the transport layer. |
Beta Was this translation helpful? Give feedback.
-
The storage is correct and the loading of the guid into SqlGuid is correct, this all round trips today to any GUID the user authors. It’s really just an issue with the comparer function internally serializing to a stack local area of memory and then comparing bytes in the wrong order |
Beta Was this translation helpful? Give feedback.
-
Appreciate the feedback and participation! Please use discussions for topics like these until the author plans to propose something concrete as a feature that is ready for review by the maintainers. |
Beta Was this translation helpful? Give feedback.
-
Also I think we do not need a special Guid creator function, UUIDNext already provides this |
Beta Was this translation helpful? Give feedback.
-
The entire discussion can be replaced by the function static Guid ReorderGuidV7ToUniqueIdentifierOrder(Guid guid)
{
Span<byte> src = stackalloc byte[16];
Span<byte> dst = stackalloc byte[16];
guid.TryWriteBytes(src, true, out _);
dst[0] = src[12];
dst[1] = src[13];
dst[2] = src[14];
dst[3] = src[15];
dst[4] = src[10];
dst[5] = src[11];
dst[6] = src[8];
dst[7] = src[9];
dst[8] = src[6];
dst[9] = src[7];
dst[10] = src[0];
dst[11] = src[1];
dst[12] = src[2];
dst[13] = src[3];
dst[14] = src[4];
dst[15] = src[5];
return new Guid(dst, false);
} |
Beta Was this translation helpful? Give feedback.
-
It was never a problem to create a Guid wich works in SQLServer, it's also what UUIDNext provides. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
If GuidsV7 are added to NET9, SQLServer inserts them in a way, so the index still gets fragmented (see issue: dotnet/runtime#103658)
There should be a new setting in the SQLClient to switch GUID serialization to "TryWriteBytes(destination, bigEndian: true)" so also SQLServer can profit from the advantages of the new guid format.
Beta Was this translation helpful? Give feedback.
All reactions