Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object (de)serialization support with Utf8JsonReader\Writer - entry point and options #28325

Closed
steveharter opened this issue Jan 4, 2019 · 59 comments · Fixed by dotnet/corefx#36776
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@steveharter
Copy link
Member

This is the API proposal and feature set for object (de)serialization. It covers the main API entry point (JsonSerializer), the options (JsonSerializerOptions) and the ability to ignore properties during (de)serialization.

Review process update: due to the size of this issue and since new API issues are being added for new features, the overview and forward-looking information has been moved to https://github.com/dotnet/corefx/blob/master/src/System.Text.Json/docs/SerializerProgrammingModel.md. Future API additions will have their own independent API issue created instead of re-using this issue.

Current Status

The reviewed portion of the API is in corefx master and Preview 4. It consists of the JsonSerializer and the JsonSerializerOptions class and provides no extensibility features.

There is a previous prototype at CoreFxLab in the package System.Text.JsonLab.Serialization. It contains non-reviewed APIs including extensibility features like using attributes to define custom value converters, property name policies (like camel-casing), etc. It is build against .NET Core 3.0 Preview 2.

The last status of the API review is shown below.

API

JsonSerializer

This static class is the main entry point.

Let's start with coding examples before the formal API is provided.

Using a simple POCO class:

    public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public DateTime? BirthDay { get; set; }
    }

To deserialize JSON bytes into a POCO instance:

    ReadOnlySpan<byte> utf8= ...
    Person person = JsonSerializer.Parse<Person>(utf8);

To serialize an object to JSON bytes:

    Person person = ...
    byte[] utf8 = JsonSerializer.ToBytes(person);

The string-based Parse() and ToString() are convenience methods for strings, but slower than using the <byte> flavors because UTF8 must be converted to\from UTF16.

namespace System.Text.Json.Serialization
{
    public static class JsonSerializer
    {
        public static object Parse(ReadOnlySpan<byte> utf8Json, Type returnType, JsonSerializerOptions options = null);
        public static TValue Parse<TValue>(ReadOnlySpan<byte> utf8Json, JsonSerializerOptions options = null);

        public static object Parse(string json, Type returnType, JsonSerializerOptions options = null);
        public static TValue Parse<TValue>(string json, JsonSerializerOptions options = null);

        public static ValueTask<object> ReadAsync(Stream utf8Json, Type returnType, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
        public static ValueTask<TValue> ReadAsync<TValue>(Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));

        public static byte[] ToBytes(object value, Type type, JsonSerializerOptions options = null);
        public static byte[] ToBytes<TValue>(TValue value, JsonSerializerOptions options = null);

        public static string ToString(object value, Type type, JsonSerializerOptions options = null);
        public static string ToString<TValue>(TValue value, JsonSerializerOptions options = null);

        public static Task WriteAsync(object value, Type type, Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
        public static Task WriteAsync<TValue>(TValue value, Stream utf8Json, JsonSerializerOptions options = null, CancellationToken cancellationToken = default(CancellationToken));
    }
}

JsonSerializerOptions

This class contains the options that are used during (de)serialization.

If an instance of JsonSerializerOptions is not specified when calling read\write then a default instance is used which is immutable and private. Having a global\static instance is not a viable feature because of unintended side effects when more than one area of code changes the same settings. Having a instance specified per thread\context mechanism is possible, but will only be added pending feedback. It is expected that ASP.NET and other consumers that have non-default settings maintain their own global, thread or stack variable and pass that in on every call. ASP.NET and others may also want to read a .config file at startup in order to initialize the options instance.

An instance of this class and exposed objects will be immutable once (de)serialization has occurred. This allows the instance to be shared globally with the same settings without the worry of side effects. The immutability is also desired with a future code-generation feature. Due to the immutable nature and fine-grain control over options through Attributes, it is expected that the instance is shared across users and applications. We may provide a Clone() method pending feedback.

For performance, when a JsonSerializerOptions instance is used, it should be cached or re-used especially when run-time attributes are added because when that occurs, caches are held by the instance instead of being global.

namespace System.Text.Json.Serialization
{
    /// <summary>
    /// Provides options to be used with <see cref="JsonSerializer"/>.
    /// </summary>
    public class JsonSerializerOptions
    {
        public JsonSerializerOptions();

        /// <summary>
        /// Defines whether an extra comma at the end of a list of JSON values in an object or array
        /// are allowed (and ignored) within the JSON payload being read.
        /// By default, it's set to false, and the reader will throw a <exception cref="JsonReaderException"/> if it encounters a trailing comma.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool AllowTrailingCommas {get; set;}

        /// <summary>
        /// The default buffer size in bytes used when creating temporary buffers.
        /// </summary>
        /// <remarks>The default size is 16K.</remarks>
        /// <exception cref="System.ArgumentException">Thrown when the buffer size is less than 1.</exception>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public int DefaultBufferSize { get; set; }

        /// <summary>
        /// Determines whether null values are ignored during serialization and deserialization.
        /// The default value is false.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool IgnoreNullValues { get; set; }

        /// <summary>
        /// Determines whether readonly properties are ignored during serialization and deserialization.
        /// A property is readonly if it contains a public getter but not a public setter.
        /// The default value is false.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool IgnoreReadOnlyProperties { get; set; }

        /// <summary>
        /// Gets or sets the maximum depth allowed when reading or writing JSON, with the default (i.e. 0) indicating a max depth of 64.
        /// Reading past this depth will throw a <exception cref="JsonReaderException"/>.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public int MaxDepth { get; set; }

        /// <summary>
        /// Defines how the <see cref="Utf8JsonReader"/> should handle comments when reading through the JSON.
        /// By default the reader will throw a <exception cref="JsonReaderException"/> if it encounters a comment.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public JsonCommentHandling ReadCommentHandling { get; set; }

        /// <summary>
        /// Defines whether the <see cref="Utf8JsonWriter"/> should pretty print the JSON which includes:
        /// indenting nested JSON tokens, adding new lines, and adding white space between property names and values.
        /// By default, the JSON is written without any extra white space.
        /// </summary>
        /// <exception cref="InvalidOperationException">
        /// Thrown if this property is set after serialization or deserialization has occurred.
        /// </exception>
        public bool WriteIndented { get; set; }

        // several more options here for other features not covered here
    }

    /// <summary>
    /// The base class of serialization attributes.
    /// </summary>
    public abstract class JsonAttribute : Attribute
    {
    }

    /// <summary>
    /// Prevents a property from being serialized or deserialized.
    /// </summary>
    [AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
    public sealed class JsonIgnoreAttribute : JsonAttribute
    {
        public JsonIgnoreAttribute();
    }
}
@steveharter steveharter self-assigned this Jan 4, 2019
@Tornhoof
Copy link
Contributor

Tornhoof commented Jan 5, 2019

Thank you for describing your API and performance concepts.
API

Performance

No objects created on the heap during (de)serialization after warm-up, with the exception of the POCO class itself and related properties during serialization.

This probably won't be true if you want to support things like ReadOnlyCollection, I probably would simply say "related properties and types".

Optimized hashtable lookup for property names using byte[] as the key.

This will be slow, I recommend talking to @rynowak about his work in ASP.NET Core Routing and the IL Matcher (the code is pretty much what you need, technically matching a route and matching a property name is quite similar), this is by far the fastest approach, in both the Routing and all the experiments I did for SpanJson https://github.com/Tornhoof/SpanJson/wiki/Technology-and-Internals#3-comparison-against-integers (which is pretty much identical to the approach in Routing).
I expect that you could probably reuse 75%+ of the Routing code (an estimation based on how long it took to port my Expression Tree Code from SpanJson to do the Routing API, which was trivial).

@davidfowl
Copy link
Member

Where’s the async API?

@ahsonkhan
Copy link
Member

ahsonkhan commented Jan 7, 2019

Where’s the async API?

In what scenario would you require async (de)serialize APIs? Can you please provide a sample usage?

@davidfowl
Copy link
Member

I'm going to be a bit obtuse here. If we're not making these APIs async then we need to have a long discussion again about ASP.NET Core usage of these APIs.

Input:

https://github.com/aspnet/AspNetCore/blob/master/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonInputFormatter.cs#L318

Output:

https://github.com/aspnet/AspNetCore/blob/ca7c48c520f2baae032e0429f19552abed77e009/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonOutputFormatter.cs#L147-L152

Just to throw out some more examples here's the HttpClient JSON formatter that's used today in most applications:

https://github.com/aspnet/AspNetWebStack/blob/39d3064baf13181f5718ec5d85ba644b47d0704b/src/System.Net.Http.Formatting/Formatting/BaseJsonMediaTypeFormatter.cs#L189

To mitigate sync reading, today we buffer the entire Stream into memory or disk (depending on the size):

https://github.com/aspnet/AspNetCore/blob/ca7c48c520f2baae032e0429f19552abed77e009/src/Mvc/src/Microsoft.AspNetCore.Mvc.Formatters.Json/JsonInputFormatter.cs#L240-L249

Which is unfortunate and we'd really love to avoid that. The plan was to have to get that benefit from the new JSON reader/writer/serializer. We don't have a good solution for output today. You end up doing synchronous IO once you hit a buffer threshold which ends up doing sync over async and leans towards causing thread pool starvation.

See dotnet/aspnetcore#6397 for more details on output. ASP.NET Core is going to disable all sync IO on the request and response by default in 3.0.

@Tornhoof
Copy link
Contributor

Tornhoof commented Jan 7, 2019

For the ASP.NET Core formatter you additionally need to support a bag of json properties, which is either serialized into the parent object or deserialized into the bag for RFC 7807 Error Details. See aspnet/Mvc#8529 for more details, that one uses JSON.NET's https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_JsonExtensionDataAttribute.htm

@ahsonkhan
Copy link
Member

cc @KrzysztofCwalina, @jkotas, @glennc, @terrajobst as an FYI (note: this is still WIP)

@rynowak
Copy link
Member

rynowak commented Jan 9, 2019

My feedback in a rough stream-of-consciousness format.

On JsonConverter, we will need the ability to specify the type dynamically as an argument - in addition to the generic overloads. Without this, it will really cause a lot of folks to use a workaround to call the generic version with MakeGenericMethod. It's OK to box in this case.

JsonConverterSettings seems like it has a mashup of settings that generally global to an operation, and don't pertain to specific object or type (indent size for example). Is it expected that I can specify a global indent size, but then override the indent size for a specific type?

I'm really intrigued by how JsonConverterSettings tries to maintain a loose coupling with the specific types of settings/options we provide support for. This is similar to a number of things we do in MVC and it scales horizontally pretty nicely.

I think it would be really neat to see a recipe guide or some side-by-side examples: "this is how you'd set the json property name with an attribute vs with code".

Another natural thing that users will want from JsonConverterSettings is a copy constructor and a way to merge them.

The JsonClassMaterializer seems like a god-object. This is where you control all object creation and property get/set operations for all objects and all types. If the intention is to use these as a replacement for JSON.Net's JsonConverter then I think it's far off. This seems like something that would be intentionally hard to replace. Maybe that's OK depending on what other things we do.


Async is a higher-level bugaboo (I think that's a SFW replacement for what I was thinking 😆). I still think C# and .NET do async better than any other language out there, but we're still struggling to pull together an end-to-end here for the web that actually avoids synchronous I/O at all levels.

From my perspective, I'm still excited to see all of the work that's here. I think we're solving a very critical problem and these are all steps in the right direction.

@steveharter
Copy link
Member Author

steveharter commented Jan 9, 2019

@Tornhoof

I'd like to see non-generic FromJson methods, e.g. for the ASP.NET Core TextFormatter classes, assuming this will not change that much

Yes that makes sense. The API is still being defined, and this issue wasn't meant to be complete at this point.

What about global settings in JsonConverterSettings, i.e. NoNulls, Int/String Enums etc?

Currently those simple settings are in JsonPropertyAttribute which can be "added" at run-time. If too verbose we may just place those right on JsonConverterSettings.

Performance

No objects created on the heap during (de)serialization after warm-up, with the exception of the POCO class itself and related properties during serialization.

This probably won't be true if you want to support things like ReadOnlyCollection, I probably would simply say "related properties and types".

Yes, of course.

Optimized hashtable lookup for property names using byte[] as the key.

This will be slow, I recommend talking to @rynowak about his work in ASP.NET Core Routing and the IL Matcher (the code is pretty much what you need, technically matching a route and matching a property name is quite similar), this is by far the fastest approach, in both the Routing and all the experiments I did for SpanJson https://github.com/Tornhoof/SpanJson/wiki/Technology-and-Internals#3-comparison-against-integers (which is pretty much identical to the approach in Routing).

Thanks. I updated the description and implementation to use an interger-based array which is looking very promising at this point in time.

@steveharter
Copy link
Member Author

@rynowak

On JsonConverter, we will need the ability to specify the type dynamically as an argument - in addition to the generic overloads. Without this, it will really cause a lot of folks to use a workaround to call the generic version with MakeGenericMethod. It's OK to box in this case.

Yes that will be added. Thanks

JsonConverterSettings seems like it has a mashup of settings that generally global to an operation, and don't pertain to specific object or type (indent size for example). Is it expected that I can specify a global indent size, but then override the indent size for a specific type

As it is now, JsonConverterSettings is fairly clean and only contains settings that would need to be specified at run-time like MaxDepth and DefaultBufferSize (I'll update the API doc soon with new members). The mashup is JsonPropertyAttribute for now.

I'm really intrigued by how JsonConverterSettings tries to maintain a loose coupling with the specific types of settings/options we provide support for. This is similar to a number of things we do in MVC and it scales horizontally pretty nicely.

Yes there is some loose coupling with the extension points like the property casing where you can provide the implementation. Other simple settings will not really be extensible in that way but the values can be specified via attributes at the assembly, class and property level plus run-time overrides to each of these -- i.e. the values for these simple settings can be changed in many flexible ways, but not necessarily the code that uses them.

I think it would be really neat to see a recipe guide or some side-by-side examples: "this is how you'd set the json property name with an attribute vs with code".

Yes I will do this in the next update.

The JsonClassMaterializer seems like a god-object. This is where you control all object creation and property get/set operations for all objects and all types. If the intention is to use these as a replacement for JSON.Net's JsonConverter then I think it's far off. This seems like something that would be intentionally hard to replace. Maybe that's OK depending on what other things we do.

JsonClassMaterializer is just an enum. The implementation is internal. We may not even need the enum if we decide the default is ok. Currently the default will attempt to use the most efficient mechanism (currently IL gen for get,set,ctor) but if IL gen is not supported then use standard reflection. However, the enum is there currently for someone to change the default if for example they have a single object to deserialize and don't want any additional memory overhead (and initial perf hit) of IL so they want to use standard reflection instead.

@steveharter
Copy link
Member Author

@davidfowl

I'm going to be a bit obtuse here. If we're not making these APIs async then we need to have a long discussion again about ASP.NET Core usage of these APIs.

I added the async apis along with a brief description. Currently they only support a Stream, more if necessary. I have a prototype that verifies the feasibility.

@KrzysztofCwalina
Copy link
Member

Some quick feedback:

  1. As @davidfowl have said, we need async APIs that can work with Stream and Pipelines
  2. I don't think the serializer/converter should have a public dependency on JsonReader. The reader should be an implementation detail. The reasons are: a) the reader is a very tricky type that most our users should not be concerned with, b) we should preserve the ability change the implementation to other than the reader.
  3. The deserailizer/converter should be in a separate namespace from the low level reader/writer types. I don't think we should expose the average .NET developer to the complexity of the readers (e.g. to by-ref types).
  4. We should decide if cancellation tokes are optional or not.
  5. I would love to have the ability to generate deserialization code at build time and include it in my projects. In Azure SDK we almost always know the shape of the deserialized schema and so emiting the deserialization code at design time would:
    • make it more efficient
    • be more AOT friendly
    • allow the deserialized types to appear immutable (as the mutators could be internal visibility).

@Tornhoof
Copy link
Contributor

@steveharter If you care about integer tricks, you can optimize the json member writing for utf8 with the same trick. Assuming you will bake precalculated byte-arrays for the property names into the IL, you can use the same concept as for deserialization. Instead of copying the precalculated buffer to the output, just write appropriate integer values to the output.
Instead of writing Encoding.UTF8.GetBytes("\"message\":"), you could also write

jsonWriter.WriteUtf8Verbatim(7306916068917079330UL /*"message*/, 14882 /*":*/);

Atleast in my SpanJson tests that was the last main difference in serialization speed between UTF16 and UTF8 as, atleast for expression trees, there is a large difference if I have
Expression.Constant("\"message:\""); which is fast for utf16 or Expression.Constant(Encoding.UTF8.GetBytes("\"message\":")) for UTF8 which was ~20% slower than the UTF16 version.
In SpanJson properties up to ~30 chars length are written via integers, for longer properties the byte[] copy (incl. expression tree) overhead is pretty much the same speed or faster
This might not be necessary if you can leverage those shiny new static byte arrays with public ReadOnlySpan<byte> property => new byte[]{0xA,0xB,0xC}, but it might be worth a try.

That's more or less the last larger performance trick in SpanJson as now the UTF8 and UTF16 serialization speeds are the same, still deserialization of UTF8 is slower, but that's mostly due to the utf8 to utf16 overhead for strings., especially since @stephentoub did all those nice TryFormat``TryParse`` optimizations for both UTF8 and UTF16.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Jan 18, 2019

An additional thing we would need to support Azure SDK: version resilient serialization, including round-tripping.

We need to be able to add a property bag field (dictionary) to deserialized types. If deserialized payload has a property that does not exist in the type, the property would be saved in the dictionary. The serializer would serialize not only actual properties of the type, but also the dictionary (so all payload round trips).

cc: @schaabs

@pranavkm
Copy link
Contributor

pranavkm commented Jan 18, 2019

@KrzysztofCwalina @Tornhoof's comment from here - https://github.com/dotnet/corefx/issues/34372#issuecomment-451928379 - goes in to this. JSON.NET supports it using ExtensionDataAttribute, the data contract serializers support it using IExtensibleDataObject.

@pranavkm
Copy link
Contributor

pranavkm commented Feb 5, 2019

I spent a little bit of time plugging in the deserializer in to MVC to see what comes out of it. This is based on version 0.1.1-e190205-1 of the package.

  • Maybe I'm holding it wrong, but I couldn't figure out a way to apply an attribute to an entire object graph. In my case, I was attempting to make the converter use camel casing:
    Doing
var settings = new JsonConverterSettings();
settings.AddAttribute(modelType, new JsonCamelCasingConverterAttribute());

This works for the immediate properties of modelType, but not sub-properties (e.g. Person.PersonAddress.StreetName).
While we're here, would it make sense to have the method name indicate this was a policy being applied? AddAttribute really doesn't convey that information.

  • JsonConvert throws any time it finds a JSON value that does not have .NET property to serialize to. It's a fairly common scenario to want to bind a subset of properties. If this is a feature that needs to ship, could it sit behind a flag?

  • Is deserialization meant to perform case sensitive property name matches? This becomes problematic with MVC since the server is configured to do camel casing, while HttpClient.PostAsJsonAsync defaults to pascal case. Case insensitive matches would circumvent this issue.

  • We spoke about this in the offline sync up, but it would be nice if FromJsonAsync supported PipeReader. In the most common case, servers in ASP.NET Core would default to using PipeReader with Stream wrappers for legacy \ convenience reasons. Using the former could possibly avoid some inefficiences.

  • Optionally including the path inJsonReaderException errors would be very useful. When working with large JSON payloads, narrowing down what failed can be very useful in diagnosing the error. Knowing there's an error in (Line 1, BytePosition: 864), is not always useful. MVC uses Json.NET's path information to produce an error object pointing to individual things that went wrong. Although it's noteworthy in that with JSON.NET, deserialization also performs some validation, so it is more useful there.

  • Json.NET supports Javascript literals which makes both { Id: 10 } and { 'Id': 10 } legal inputs. While I can see this tripping up MVC users hand crafting json payloads, it matches the spec and browser behavior (JSON.parse). If there are plans to support single quotes, perhaps you could also consider the other format.

  • Deserializing some types don't work. For instance, Dictionary<,> didn't work. Some of the more exotic types such as byte? didn't work. The converter spat out todo errors, so I'm guessing this is WIP.

  • There's a mix of InvalidOperationException and JsonReaderException thrown by the converter for the same kinds of exceptions. All parser \ formatting exceptions should uniformly be JsonReaderException.

@steveharter
Copy link
Member Author

@pranavkm

Thanks for the feedback!

The attributes can be specified at an Assembly level for all types in a given Assembly. By using design-time attributes like this it helps to isolate the settings for a particular consumer's types assuming they belong to the same Assembly. I can revisit this and\or make it easier to specify "globally" per the settings class if we want to do that.

settings.AddAttribute(modelType.Assembly, new JsonCamelCasingConverterAttribute());

Is deserialization meant to perform case sensitive property name matches

Not by default (current plan). I relialize this is different than Json.Net. However this helps with performance slightly and is the "right" json-spec thing to do. I haven't added the case-insensitive switch yet.

We spoke about this in the offline sync up, but it would be nice if FromJsonAsync supported PipeReader

Yes that is the current plan. I will be working on it this week.

Json.NET supports Javascript literals which makes both { Id: 10 } and { 'Id': 10 } legal inputs.

I'll look into this. @ahsonkhan has this been considered?

Deserializing some types don't work. For instance, Dictionary<,> didn't work.

Dictionary is not implemented yet. The next iteration (and current PR in corefxlab) support the Nullable types.

There's a mix of InvalidOperationException and JsonReaderException thrown by the converter for the same kinds of exceptions

Yes I plan on having the exceptions cleaned up and have the information regarding line number etc. The internal reader throws FormatException as well and I'll switch to using the TryParse* methods instead to avoid that.

@ahsonkhan
Copy link
Member

ahsonkhan commented Feb 6, 2019

has this been considered?

Yes, and given single quote strings are not JSON RFC compliant, we decided against supporting it at the lowest layer. Adding too many of these switches/options results in perf costs for the most common cases of using the Utf8JsonReader, where people have valid, spec-complaint JSON payloads. If there is a strong requirement for this, especially if its affecting a lot more customers, we can reconsider adding an option for it at the reader layer.

That said, as an alternative solution, we should evaluate if this is something that occurs frequently at the higher layer (i.e. Deserializer) and consider only supporting it there.

Optionally including the path inJsonReaderException errors would be very useful. When working with large JSON payloads, narrowing down what failed can be very useful in diagnosing the error. Knowing there's an error in (Line 1, BytePosition: 864), is not always useful.

Unfortunately, this has performance implications since it requires potentially storing the path in some re-sizable data structure which ends up allocating. We have made an effort to keep the reader non-allocating with emphasis on performance which made providing Path info difficult.

@BrennanConroy
Copy link
Member

Similarly I've tried out the deserializer on the SignalR side, version 0.1.1-e190201-1.

One of the biggest issues is that we use the Utf8JsonReader up until we find an array of objects that we want to parse. Then we grab each object individually and pass them into the converter. The problem is that in order to get a usable chunk of data into the converter from the reader, you need to do a lot of manual work.
For example to serialize a string like type (String, DateTime, etc.):

var bytes = reader.HasValueSequence ? reader.ValueSequence.ToArray() : reader.ValueSpan;
if (reader.TokenType == JsonTokenType.String)
{
    var b = new byte[bytes.Length + 2];
    b[0] = 34; // "
    bytes.CopyTo(b.AsSpan().Slice(1, bytes.Length));
    b[bytes.Length + 1] = 34; // "
    bytes = b.AsSpan();
}
var obj = Serialization.JsonConverter.FromJson(bytes, type);

(Yes I know this is unoptimal, but ignore that)
And to deserialize an array you'd need to do a similar thing and add the brackets around the payload.

Another thing is that the DateTime parsing is far too strict. When testing I had to write my own deserialization for DateTime so I wasn't even using the converter anymore. I know there is https://github.com/dotnet/corefx/issues/34576 for the writer doing "smallest output that round trips", and https://github.com/dotnet/corefx/issues/34690 for the reader getting DateTime directly, so it will probably be fixed around the same time as those.

Because Json properties can be in any order we sometimes need to store a "JArray" like object to then be used later in the deserialzer. For now I store the BytesConsumed from the reader, then run Skip() and store the BytesConsumed afterwards and use both of those later on the original ValueSequence to get the bytes needed to pass into the deserializer. Some kind of first class feature for this would be amazing.

Looking to the future where we'll be using the Utf8JsonWriter alongside the serializer, there needs to be a way of writing to the writer either from the serializer, or get the json out of the serializer and write raw bytes to the writer. I know @ahsonkhan has proposed some APIs in that area for exposing writing raw bytes to the writer. And was recently working on a change for writing JsonElement to the writer.

On a brigher note, when I did some microbenchmarks with the Utf8JsonReader + JsonConverter they turned out ~77% faster than using the low level types of Newtonsoft.Json.

cc @ahsonkhan I probably missed some feedback from when I was talking with you.

@pranavkm
Copy link
Contributor

pranavkm commented Feb 8, 2019

The attributes can be specified at an Assembly level for all types in a given Assembly.

That wouldn't necessarily work for MVC since complex sub-properties could come multiple different assemblies.

I can revisit this and\or make it easier to specify "globally" per the settings class if we want to do that.

That sounds perfect.

@steveharter
Copy link
Member Author

@pranavkm

That wouldn't necessarily work for MVC since complex sub-properties could come multiple different assemblies.

I see three way to add global settings:

  1. Duplicate the properties from JsonPropertyAttribute (and perhaps others) to JsonConverterOptions.
  2. Support a null for the first parameter in AddAttribute() to represent "all" assemblies. Somewhat unconventional.
  3. Add a property to JsonConverterOptions to return a JsonPropertyAttribute. Somewhat unconventional.

Basically options 2 and 3 prevent duplicating 10+ eventual properties from JsonPropertyAttribute.

However if there is a need to use JsonCamelCasingConverterAttribute or having a different DateTime converter, etc. then option 2 looks more appealing. Even if we do option 2 we could still do option 1 for JsonPropertyAttribute for better discoverability of common options.

Currently the API here is taking option 1 but currently just for JsonPropertyAttribute, not for camel-casing and others (pending API discussion).

@rynowak
Copy link
Member

rynowak commented Feb 12, 2019

Is there really a case to be made that configuring a setting per-assembly is more common that configuring a setting globally?

Put another way, I think there's a really strong case to be made for configuring a setting per-type, per-property and globally, and a weak case to be made for configuring settings per-assembly. Do you have use-cases in mind for that?

@steveharter
Copy link
Member Author

@rynowak

configuring a setting per-assembly... Do you have use-cases in mind for that?

Here you go:

  1. My organization requires a couple hundred POCO Types to be used internally.
  2. Various developers add POCO Types over time.
  3. Various teams use these POCO Types in different settings so they want to use design-time attributes for all possible settings to avoid mismatches in applying the same run-time settings.
  4. The organization requires all POCO types to default to camel-casing by default, and require the use of a custom DateTime converter for interop with other code. Since it is easy to forget to apply the attributes to evey POCO type (or property) the attribute can be applied to the assembly that contains the POCO types.

@rynowak
Copy link
Member

rynowak commented Feb 12, 2019

That seems like you'd just apply the defaults globally though not per-assembly...

@steveharter
Copy link
Member Author

@BrennanConroy

Because Json properties can be in any order we sometimes need to store a "JArray" like object to then be used later in the deserialzer... Some kind of first class feature for this would be amazing.

Is the feature you are asking for controlling the deserialization order of properties? And secondary I suppose specify the serialization order although that's not important.

@steveharter
Copy link
Member Author

That seems like you'd just apply the defaults globally though not per-assembly...

Perhaps if you don't use types from other assemblies that have their own contract\schema.

However, currently there is not a way to specify design-time attributes globally (across assemblies). You could use the run-time AddAttribute(null, myattribute) however if we decide that will work.

Do you have something else in mind (config file support maybe)?

@BrennanConroy
Copy link
Member

@steveharter We need a way of storing a json array for later use in the deserializer.

For example, the following two json payloads are equivalent and the target value is used to determine what types are in the arguments array, however, when using the forward reading only Utf8JsonReader we don't know the target until later, so we need to store the array until we have the target.

{ "target": "name", "arguments": [ 1, "example", 2.3 ] }
{ "arguments": [ 1, "example", 2.3 ], "target": "name" }

With Newtonsoft.Json we can do:

argumentsToken = JArray.Load(reader);
// ... later
for (var i = 0; i < length; i++)
{
    argumentsToken[i].ToObject(type, serializer);
}

@rynowak
Copy link
Member

rynowak commented Feb 12, 2019

However, currently there is not a way to specify design-time attributes globally (across assemblies). You could use the run-time AddAttribute(null, myattribute) however if we decide that will work.

Right, I'm questioning why we need the ability to specify settings per-assembly. Existing JSON serializers don't have it and I've never see a user ask for it.

Specifying settings for the application as a whole (regardless of what assembly the data types live in) is exceedingly common for a web project.

@terrajobst
Copy link
Member

Notes from today are here.

@jtillman
Copy link

IMO, the deserialization calls should implement 'TryRead'. Since the json attempted to be deserialized typically comes from external sources, serialization exceptions are expected to be a common occurrence. If we place deserialization at the edge of a protocol, then the try/catch pattern will cause a noticeable amount of performance (time) to serve the request due to exception catching.

@steveharter
Copy link
Member Author

steveharter commented Feb 24, 2019

@BrennanConroy

We need a way of storing a json array for later use in the deserializer.

It seems you require three pending features:

  1. Ability to (de)serialize a heterogeneous array: your arguments array has multiple types
  2. Ability to (de)serialize into temporary state for "overflow" \ "hidden" properties: your arguments array is likely not a public property
  3. Assign temporary state once deserialization is finished for the object: your arguments array needs to be passed to the appropriate logic based upon the value of the target property

For (1) we have had discussions on using the existing JsonElement in some fashion, although currently that is coupled tightly to JsonDocument (cc @bartonjs @ahsonkhan ) so we need to decide whether to extend JsonElement or create a new type to handle this.

For (2) and (3) I am working on the explicit class (de)serialize feature where you can implement before- and after-serialize methods along with per-property callbacks including "overflow" properties that exist in the json but not the class.

Here's a mock-up using JsonElement + "overflow property" + callbacks.

// Either through an attribute or code, register your callback\converter:
settings.AddConverter<MyPocoType>(typeof(MyPocoTypeConverter));
...

// Implement your callbacks (an instance of this struct is created per POCO instance):
public struct MyPocoTypeConverter : ITypeConverterOnMissingPropertyDeserialized, ITypeConverterOnDeserialized
{
    IList<JsonElement> _arguments;

    void ITypeConverterOnMissingPropertyDeserialized.OnMissingPropertyDeserialized(
        object obj,
        JsonClassInfo jsonClassInfo,
        string propertyName,
        object propertyValue,
        JsonTokenType tokenType,
        JsonSerializerOptions options)
    {
        if (jsonPropertyInfo.PropertyName == "arguments")
        {
            Debug.Assert(tokenType == JsonTokenType.Array);
            _arguments = (IList<JsonElement>)propertyValue;
        }
    }

    void ITypeConverterOnDeserialized.OnDeserialized(
        object obj,
        JsonClassInfo jsonClassInfo,
        JsonSerializerOptions options)
    {
        MyPocoType poco = (MyPocoType)obj;
        if (poco.Target == "CallMethod2")
        {
            poco.Method2(
                _arguments[1].GetString(),
                _arguments[0].GetInt32(),
                _arguments[2].GetDecimal()
        }
    }
}

@BrennanConroy
Copy link
Member

@steveharter I think there is some misunderstanding of our scenario.

We use the Utf8JsonReader to read all the properties both for perf and for custom logic while walking through the payload. One thing we need is the ability to store the "arguments" property (which is an array) for later use in the deserializer. Currently I have a bunch of hacks to try and store the position of the Utf8JsonReader for the "arguments" property and then try to pass in each element to the deserializer one by one.

I showed the Newtonsoft example which is what we already use for the current Json parsing and works perfectly. I think what you're showing is assuming we just call the deserializer with the whole payload which is definitely not what we're doing.

We need interoperability with the Utf8JsonReader and deserializer (similarly with the Utf8JsonWriter and the serializer).

if (utf8JsonReader.ValueSpan.SequenceEqual("arguments"))
{
    argumentsToken = JArray.Load(utf8JsonReader);
}

// ... later
for (var i = 0; i < argumentsToken.Size(); i++)
{
    argumentsToken[i].ToObject(type, serializer);
}

@steveharter
Copy link
Member Author

@BrennanConroy thanks for the clarification. Yes I was assuming you wanted to use the (de)serializer for the entire scenario, at least for the flow-of-control because the (de)serializer will allow access to the lower-level reader\writer in specific callbacks.

We currently have JsonDocument \ JsonElement but since it doesn't currently accept an existing reader you can't do something like JArray.Load. However, those types do seem to be the natural fit to hold the raw values, which then could be passed to the deserializer.

@BrennanConroy
Copy link
Member

Should I open a new issue for that feature?

@steveharter
Copy link
Member Author

@BrennanConroy I'll discuss the feature offline with @bartonjs and @ahsonkhan and then reply back here.

One of the cool things about the reader (and writer) is that it supports a streaming model where the buffer doesn't have to contain all of the data. When the reader runs out of buffer for the current property, it returns false and expects to be called again with a fresh buffer (and to be passed the previous JsonReaderState object back in so it can pick up where it left off).

However this complicates scenarios where a more than one property must be returned like in a "JArray" case because every property in theory may have to ask for a fresh buffer.

The deserializer handles this scenario and returns the whole object\array when it is finished including when it spans multiple buffer refreshes (which occurs when using the async pipe\stream APIs). However, the reader doesn't support this scenario internally (it leaves it up to the caller, like the serializer) and the document\element supports this scenario but requires all of the data up-front in basically one large buffer (for stream cases it reads to the end up-front).

So we have to decide who should support this multi-property functionality - the reader\writer, the document\element, the (de)serializer, and\or the caller.

@TylerBrinkley
Copy link
Contributor

TylerBrinkley commented Feb 28, 2019

Is it really necessary for implementers of JsonValueConverter<TValue> to implement both Write overloads? The logic in them are going to be identical except for the call to Utf8JsonWriter's WriteXXXValue vs WriteXXX method.

Could we add a method to Utf8JsonWriter to prep it with a property name so that the next WriteXXX call writes the value as a property instead of value?

Or alternatively, could we have a single Write(bool writeProperty, Span<byte> escapedPropertyName, TValue value, ref Utf8JsonWriter writer) method where the implementer would then make the appropriate call to the Utf8JsonWriter based on the writeProperty flag?

@steveharter
Copy link
Member Author

@TylerBrinkley

Is it really necessary for implementers of JsonValueConverter to implement both Write overloads?

I've been working with @ahsonkhan on addressing this by either pre-populating the property name and just calling WriteValue methods, wrapping the writer somehow, or by having the Write methods in the writer detect whether the property name is empty and call WriteValue if so (however unfortunately json allows empty property names so that may not work).

@steveharter
Copy link
Member Author

@BrennanConroy update: we are working on extending JsonElement and JsonDocument to make them more usable in your scenario and for scenarios where the (de)serializer has to support "overflow" properties and cases where a property is of type object or JsonElement.

@TylerBrinkley
Copy link
Contributor

TylerBrinkley commented Mar 1, 2019

Would it make sense to add a JsonSerializerReadOptions class that either inherits or is composed with JsonReaderOptions and include IgnoreNullPropertyValue in the new class? Same applies for writer options.

It wouldn't surprise me if we would want to add more options related to reading and writing later and having some options in JsonSerializerOptions' Reader/WriterOptions property and others directly in JsonSerializerOptions will be confusing to consumers.

@AlexeiScherbakov
Copy link

Also it will be usefull to have something like custom Exression serializer builder

LambdaJsonClassConverterBuilder<TestClass2> builder2 = new LambdaJsonClassConverterBuilder<TestClass2>();
builder2.AddObjectProperty(x => x.A, "a");
builder2.AddIntegerProperty(x => x.B, "b");
builder2.AddObjectProperty(x=>x.C,"c");

It will be usefull when class is separated from json serializer logic

@TylerBrinkley
Copy link
Contributor

TylerBrinkley commented Mar 4, 2019

A big gap I see in the proposal as specified is constructor handling. For a domain-driven design I think it is common to instantiate required parameters in constructors so that the object is always in a valid state.

Since this current proposal only looks for a constructor without parameters the current state of deserialization is unfortunately limited to simple types like data transfer objects. Json.NET invokes the default constructor or else the constructor specified with the JsonConstructorAttribute applied however there's no equivalent option here.

Is more advanced constructor handling being considered?

@fredrikhr
Copy link
Contributor

fredrikhr commented Mar 22, 2019

Please include a generic overload to JsonSerializerOptions.GetClassInfo(Type) like this:

namespace System.Text.Json.Serialization
{
    partial class JsonSerializerOptions
    {
        public JsonClassInfo GetClassInfo<T>() => GetClassInfo(typeof(T));
    }
}

Update: Retracted suggestion based on comment by @khellang (https://github.com/dotnet/corefx/issues/34372#issuecomment-475721641)

@khellang
Copy link
Member

It was just covered in the review; https://www.youtube.com/watch?v=_CdV75tEsVk. TLDR; There's no need for it and it's easy enough for you to add it yourself as an extension method.

@steveharter steveharter changed the title Object (de)serialization support with Utf8JsonReader\Writer Object (de)serialization support with Utf8JsonReader\Writer - entry point and options Mar 23, 2019
@scottdorman
Copy link

@ahsonkhan can you comment on the history of using System.Text.Json instead of System.Json -- did we not use System.Json because it already contained older\existing APIs?

@steveharter Have there been any updates/consideration to using System.Json as the namespace rather than System.Text.Json?

The types in System.Json are, if I remember correctly, a throwback to Silverlight 3. They constitute a legacy API and don't conflict (in naming) to the types in the new JSON API. It seems like the new JSON APIs should be in a System.Json namespace as well, since that seems (to me at least) to be the more natural container and is also consistent with the use of the System.Xml namespace to contain all types that are XML related.

@steveharter
Copy link
Member Author

Have there been any updates/consideration to using System.Json as the namespace rather than System.Text.Json?

I believe that decision is closed. cc @terrajobst

However other feedback is whether System.Text.Json.Serialization should be a sub-namespace of the reader\writer\document that is in System.Text.Json based on how to best factor higher-level functionality (serializer) vs lower-level (reader\writer).

@khellang
Copy link
Member

Has there been considered or is it possible for the serializer to provide an additional "report" of deserialized properties?
I.e. which properties were set by the serializer and which properties were missing from the JSON payload. This would be really interesting information for web frameworks that do model binding.

Consider the following POCO:

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime BirthDay { get; set; }
}

Since no property is nullable (assuming nullable reference types are enabled and considered), all properties are required by default, but if you call

ReadOnlySpan<byte> utf8= ...
Person person = JsonSerializer.Parse<Person>(utf8);

Where utf8 contains the following JSON payload

{
  "firstName": "John",
  "lastName": "Doe"
}

You'll end up with person.BirthDay == default(DateTime).

At that point, you have no clue whether the BirthDay property was actually specified as 0001-01-01T00:00:00.0000000, or if it's just the default value.
That's why you normally define all non-nullable properties as nullable and use validation (and additional metadata, like attributes) to make sure all required values are present.

The problem with that, is that now you have a DTO full of nullable properties, which all contain valid values (verified by validation). Every time you want pull a value from a property, you have to either guard against null (or use HasValue + Value) or supress warnings by static analysis tools (including the new null-flow analysis by Roslyn). This really sucks.

@rynowak Is this something that MVC could use?

@rynowak
Copy link
Member

rynowak commented Apr 11, 2019

It sounds like what you're after here is input validation - validating what's on the wire based on metadata associated with properties. I don't think what you're proposing is going to be enough to do that in a good way - IMO if we want to have input validation support it would have to be based on callbacks built into the serializer, not layered on top.

Speaking of input validation, I'm not sure why you'd want to use that to specify a behaviour difference between explicit null and omitted. It doesn't seem useful in consideration of the end to end - by that I mean it doesn't seem like a good way to specify a REST api.

@khellang
Copy link
Member

khellang commented Apr 11, 2019

It sounds like what you're after here is input validation - validating what's on the wire based on metadata associated with properties.

Yeah, but not any advanced validation, just present/not present.

I don't think what you're proposing is going to be enough to do that in a good way - IMO if we want to have input validation support it would have to be based on callbacks built into the serializer, not layered on top.

Why would it need to be callback-based? What is my proposal lacking to do it "in a good way"?

I'm not sure why you'd want to use that to specify a behaviour difference between explicit null and omitted.

That's not what I'm saying. For nullable properties, I think explicit null or omitted entirely should have the exact same result. I'm talking about differentiating between non-nullable properties' default values or omitted properties. I.e. actual int 0 or int default (omitted).

@rynowak
Copy link
Member

rynowak commented Apr 11, 2019

That's not what I'm saying. For nullable properties, I think explicit null or omitted entirely should have the exact same result. I'm talking about differentiating between non-nullable properties' default values or omitted properties. I.e. actual int 0 or int default (omitted).

OK thanks that helps.

I think your proposal is lacking because it solves the problem at the wrong layer. It assumes that we can get a result from the serializer that ASP.NET knows how to interpret - I don't think this goes far enough. A feature like this has be more built in than returning a list of properties.

Think about what would happen for a nested object - how would that work?

@khellang
Copy link
Member

I think your proposal is lacking because it solves the problem at the wrong layer.

What layer is the correct layer? The only one that knows this information is the serializer, so it has to provide this information somehow. Are you saying you want it at an even lower level? 🤔

Think about what would happen for a nested object - how would that work?

That's what you have JSON path for 😉

@khellang
Copy link
Member

Anyway, I'm not dead set on a list of properties. I'm totally open for alternative solutions.

I just think it's a problem worth solving, cause today's "workarounds" are pretty bad. This seems (to me) like the best level to solve it, and a perfect opportunity as the APIs are being brought in.

@poke
Copy link
Contributor

poke commented Apr 14, 2019

Anyway, I'm not dead set on a list of properties. I'm totally open for alternative solutions.

I'm not sure how a nice API could possibly look like for this though. A list of property paths would be a bit annoying since then you'd spent most of the time parsing paths.

I think some callback during the binding of a single object would work best actually. Passing in the target object instance, the source JSON reader and maybe a path would probably work. And that callback could then allow you to customize the binding behavior, or it could be used for validation.

@scottdorman
Copy link

In regards to the top-level namespace...

I believe that decision is closed. cc @terrajobst

I think System.Text.Json is really the wrong namespace, but if the decision is fixed, then so be it.

However other feedback is whether System.Text.Json.Serialization should be a sub-namespace of the reader\writer\document that is in System.Text.Json based on how to best factor higher-level functionality (serializer) vs lower-level (reader\writer).

Are you asking if the public types currently in System.Text.Json.Serialization should be kept in that namespace or moved to be part of System.Text.Json?

If so, and seeing that System.Text.Json.Serialization.JsonSerializer is analogous to Newtonsoft.Json.JsonConvert (at least in what I perceive to be its intent), I think it should be in System.Text.Json. Since the majority of use cases will probably be serialization/deserialization, the natural instinct will probably be to reference "System.Text.Json", thinking that will get you access to everything. It seems counter-intuitive to me that I'd have to reference a sub-namespace to get the 90% use case scenario.

@am11
Copy link
Member

am11 commented Apr 15, 2019

One reason of choosing System.Text.Json was described in the PR that ported System.Json to .NET Core: dotnet/corefx#9897 (diff). The idea was to keep the System.Json implementation in a separate assembly as deprecated.

Maybe all public classes under System.Json can be decorated with [Obsolete] now, indicating we should consider newly added types under System.Text.Json in .NET Core 3.0?

@CrossBound
Copy link

in reference to @khellang's comment, I definitely think there should be a way to differentiate whether a property was supplied if it is not a nullable type. I specifically ran into this situation in a Web API 2 API for my employer. We want the RequestModel to be validated but I also want the RequestModel to include primitive properties such as int, bool, etc. Example: I have a request model like this:

    public class GetATMsRequestModel
    {
        [Required]
        public decimal Latitude { get; set; }

        [Required]
        public decimal Longitude { get; set; }
    }

Notice that on my decimal properties is have added the [Required] attribute. What I was astounded to find was that, in a situation like this, if I neglected to send in one of the properties in my request JSON, the api happily accepted the input and never raised an error
Reason? The parser was assigning a default value before model validation. This required me to

  1. use nullable types... which is annoying because it means I have to convert to/from a nullable type when it shouldn't be necessary.
  2. perform custom deserialization

To solve this problem I had write a custom ContractResolver for use with newtonsoft.json like this:

    public class RequiredPropertyContractResolver : CamelCasePropertyNamesContractResolver
    {
        protected override JsonProperty CreateProperty(MemberInfo member, MemberSerialization memberSerialization)
        {
            JsonProperty property = base.CreateProperty(member, memberSerialization);

            // if we are marking up the property as 'required', we need to make sure that the JSON serialization will
            // respect that and force the JSON to contain the value we want. Setting the JsonProperty Required field
            // to Required.Always causes the serializer to throw an exception if the JSON doesn't contain the property
            if(member.CustomAttributes.Any(a => a.AttributeType == typeof(System.ComponentModel.DataAnnotations.RequiredAttribute)))
            {
                property.Required = Required.Always;
            }

            return property;
        }
    }

If we add the [Required] attribute to non-nullable type... there should be a way for the serializer to respect that.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

Successfully merging a pull request may close this issue.