-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type mapping APIs to customize JSON value serialization/deserialization #30677
Comments
The base implementation could call We could allow JsonReaderOptions and JsonWriterOptions or a custom type in context options to control the default implementation. |
Looking into this a bit more I don't think it's possible to create a performant abstraction that would allow to replace the implementation without massive amounts of code. Some things to consider:
So here's another proposal:
For the server-side conversion consider #30729. |
Opened #30744 to track that.
Yeah; our core deserialization logic would be the one iterating over the properties, and calling the type mapping API only for the values of those properties based on the names.
Unless I'm mistaken, EF currently always reads the entire row into memory at the DbDataReader level (i.e. we don't specify CommandBehavior.SequentialAccess); one reason for that is that expression trees aren't compatible with async, which would be necessary if we wanted to stream the row. However, it seems that Utf8JsonReader buffers by property anyway. In other words, according to this sample, you call Read() to get the next property, and if that returns false, that means there aren't enough bytes in the buffer and you need to get more. If that returns true, it seems that you're guaranteed to be able to read the next property; note that there's a GetString() API, but no TryGetString - which I understand to mean that string values are buffered, and you never need to deal with streaming the property value. If that's the case (we can verify), we don't need to worry about this at all.
Looks good - though what's the |
That's the callback to increase the buffer size.
Utf8JsonReader doesn't buffer at all. If the current token doesn't fit in the passed in buffer you need to increase the buffer size to be able to keep the part that it tried to read so far. In other words, when reading from a stream the buffer must be able to fit the largest token. |
@AndriySvyryd the question is whether a single token is buffered - the Utf8JsonReader seems to suggest it is... To read a single string property via Utf8JsonReader, I think the user does the following: if (!reader.Read())
{
// Not enough bytes for the next token!
// Return to read more bytes from the stream etc.
}
var string = reader.GetString(); GetString() doesn't have a way of telling you that it didn't have enough bytes (e.g. there's no TryGetString). So if I'm getting it right, within a specific property value (which is what our type mapping API handles), we shouldn't need to worry about this. We can set up a quick experiment to prove this. Am I misunderstanding the API? |
Right, but Read won't return true unless the next token is completely contained in the buffer. And we'll need to call it for collections of primitive types. |
Perhaps a better signature would be |
No, this implies that the buffer should fit the entire value, not just the largest token. How about this? ref struct Utf8JsonReaderManager
{
readonly Stream? _stream;
byte[]? _buffer;
ref Utf8JsonReader _currentReader;
public ref Utf8JsonReader MoveNext();
}
T FromJson(ref Utf8JsonReaderManager) |
I think I understand where we have a disconnect... The APIs I was thinking of here were about the element type mapping, i.e. reading a timestamp, int or string (i.e. a primitive JSON token) - those shouldn't be a problem. But you're right that we need to think about the collection, and also about nested collections. However, if we do want to allow streaming, don't we have a bigger general problem with value converters, regardless of JSON? In other words, since we're implementing serialization/deserialization of the primitive collection as a value converter, we'd need streaming support for value converters, which is something we don't have at the moment... Maybe let's discuss on Monday... |
Design decision: the type mapping API should support streaming. We won't do real streaming (since we don't support that on e.g. DbDataReader because of LINQ expression tree limitations). However, if we implement optimized UTF8 Stream access via DbDataReader.GetStream() (should be supported at least with PostgreSQL, SQL Server), we can read small chunks out of that stream and hand those chunks to the type mapping. |
Yeah, generally looks good - though if we go in this direction, maybe it's possible to simply expose GetNextReader(), and encapsulate the details of reading more into the buffer and creating the new reader inside the manager? In other words, I use my reader until there's no more data; at that point I go to the manager and ask for the next reader - it reads from the stream into the buffer, creates a new reader of that and returns it to me. If we want to, we should be able to make the manager API shape compatible with real streaming (in case we can make that work one day). That would mean making the manager a regular struct, and having a MoveNextAsync returning Whether it's worth the extra API shape complexity is a good question... /cc @NinoFloris @vonzshik, this bears a bit of resemblence to our type handling stuff in Npgsql :) |
I thought that's what my proposal was. What do you think needs to change in the API shape?
For async we'll need a separate method - |
Ah, I misread your proposal as exposing the fields publicly. All good.
Sure, though at that point it would be a breaking change for all existing type mappings. One more idea regardless: we should can consider guaranteeing that Read() has already been called on the reader before the manager was passed to the API, and that it returned true. That would allow simple type mappings which just need a single token to just always read it from the reader, without dealing with the Read, getting a new reader, etc. Complex types (like collections) would need to deal with the complexity, but that will be very rare. |
Yes, we can always do this in |
Blocked on dotnet#30677 for serializing spatial types as WKT in JSON Closes dotnet#30630
Note I moved the server-side conversion part of this to #30727, so this only tracks the serialization/deserialization component. |
Part of #30677 There is still work to do here, but this is basically working so I wanted to get it out for review. Remaining work: - Negative cases/exception messages - Re-introduce checks for whether or not query supports primitive collections - Cases where the primitive collection object exists and should be added to - Custom element type mapping on the property - Custom element reader/writer - Custom collection reader/writer - Use default element type mapping configured in ConfigureConventions - Update Cosmos primitive collection mapping - More query tests!
Part of #30677 There is still work to do here, but this is basically working so I wanted to get it out for review. Remaining work: - Negative cases/exception messages - Re-introduce checks for whether or not query supports primitive collections - Cases where the primitive collection object exists and should be added to - Custom element type mapping on the property - Custom element reader/writer - Custom collection reader/writer - Use default element type mapping configured in ConfigureConventions - Update Cosmos primitive collection mapping - More query tests!
When reading and writing JSON documents, we need to control the precise conversion from a .NET value to a JSON element, to be integrated in the larger JSON document. This is somewhat similar to GenerateNonNullSqlLiteral, but for values inside JSON. In simple cases (int, string), no special logic is needed; but in certain cases, we need to precisely control the representation. For example, a NetTopologySuite Geometry would likely need to be serialized as WKT, which isn't necessarily what ToString returns.
Note the interaction between this issue and #30604, which is about switching to the lower-level Utf8JsonReader/Utf8JsonWriter APIs rather than passing through DOM. If/when we do that, the type mappings would be the ones calling specific APIs e.g. on Utf8JsonWriter.
In addition to serialization/deserialization, we also need a hook for perform a SQL conversion from the JSON (text) value to the actual store type; this is needed when extracting a value from a JSON document and e.g. comparing them to a regular database column. For example, the JSON representation of a timestamp is ISO8601 (with a T), but in SQLite, a timestamp is typically represented as a timestamp text without the T (and soMoving the server-side conversion part to #30727.datetime()
must be invoked). Similarly, to convert a geometry WKT representation to an actual SQL Server geometry type, a function to parse the WKT needs to be invoked.Tests affected:
The text was updated successfully, but these errors were encountered: