-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Utf8Json as the internal serializer #3493
Comments
|
I've run a small, simple console app that continuously loops a search request returning a fixed response of 100 StackOverflow questions, and profiled memory usage with dotMemory, using both NEST 6.4.0 and the Comparison overviewOverall, utf8json allocates less total memory and objects. utf8json allocates more total .NET memory, and uses slightly more of that memory than 6.4.0. The breakdown by namespace (doesn't seem to be a nice way to export this data...)
Unsurprisingly, NEST 6.4.0 allocates more objects and bytes from the Nest namespace. A large proportion of this can be attributed to the internalized Json.NET types. utf8json allocates more bytes from the System namespace, with the largest contributor down to I've attached a zip of both workspaces that can be opened in dotMemory 2018.2.3: utf8json uses a ThreadStatic |
Utf8Json only supports UTF-8 encoding, and may fail with any other encoding. According to the JSON RFC 8259:
|
Have you considered Jil? |
@abrobston Jil was a consideration. There are a few reasons why utf8json was pursued
More generally, the API surface of utf8json looks small enough to introduce types such as |
An update on progress! Commits have been going into https://github.com/elastic/elasticsearch-net/tree/feature/utf8json-serializer branch, with only a few remaining failing unit tests and integration tests (these are expected right now and will be fixed). I want to summarize the changes thus far, and also to itemize the remaining items that I'm aware of: Breaking Changes
Outstanding
|
I've moved the outstanding issues into separate issues to keep their scope focused. I'm going to close this issue now as utf8json has been implemented into 7.x. |
This issue has been opened to discuss moving the internal serializer from Json.NET over to a faster JSON serialization library.
The feature/utf8json-serializer branch contains a minimal viable prototype of deserializing an
ISearchResponse<T>
and serializingISearchRequest
.Some key observations working with utf8json whilst putting together this prototype:
Hit<T>
requires a custom formatter to be resolved at theIJsonFormatterResolver
level because it contains a generic type property whose formatter,SourceFormatter<T>
, cannot be resolved usingJsonFormatterAttribute
. If it were possible to resolve, then it would be possible to attributeHit<T>
with[JsonFormatter(typeof(HitFormatter<>))]
, and have the_source
field attributed with[JsonFormatter(typeof(SourceFormatter<>))]
. For now, initialize an instance ofSourceFormatter<T>
inside theHitFormatter<T>
constructor.Implementation does not handle different field casings
HitFormatter<T>
avoids allocating strings when reading property names by usingAutomataDictionary
. This dictionary lives outside of the genericHitFormatter<T>
to avoid creating an instance of the dictionary for eachT
.Both
JsonReader
andJsonWriter
are structs passed by ref, so cannot be captured inside of localfunctions or lambda expression bodies, but instead would need to be passed as a ref parameter to a function. An example is
JoinFieldFormatter
's Serialize method.utf8json does not have a similar concept to
[JsonObject(MemberSerialization.OptIn)]
toonly serialize those members that have been explicitly attributed with
DataMemberAttribute
.This is something that would ideally be needed as it is cumbersome to set
[IgnoreDataMember]
on all properties that should be ignored.
ConnectionSettings
is retrieved by castingIJsonFormatterResolver
to a known concreteimplementation that exposes it as a property. Not ideal, but it works.
utf8json does not make a distinction between an integer token and a float token as Json.NET
does. This is not so much of a problem, since the bytes for the token can be inspected to determine
if they contain a decimal point, and use utf8json's internals to deserialize accordingly. Also, this
is needed only in cases where an integer/double distinction is necessary. See
FuzzinessFormatter
for an example.
The equivalent to
JsonConverter
,IJsonFormatter<T>
, only has a generic variant. In several placesin the client, we may serialize using the an interface, but deserialize using the concrete implementation.
This is handled by
ConcreteInterfaceFormatter<TConcrete, TInterface>
, where the formatteris
IJsonFormatter<TInterface>
. An interesting case is when the concrete type should be serializedas the interface; in such scenarios, we end up with two formatters, one for the concrete type and one
for the interface, where each formatter references the others' serialize/deserialize implementation. See
QueryContainerFormatter
andQueryContainerInterfaceFormatter
for an example.Benchmarking the
feature/utf8json-serializer
branch against the 6.4.0 nuget package in deserializing a fixed byte response of 100, 1000 or 10000 Stackoverflow questions, the following results are collected.100 Stackoverflow questions
1000 Stackoverflow questions
10,000 Stackoverflow questions
A nice advantage of using utf8json as the internal serializer is that the handoff to a custom serializer can be done using a
MemoryStream
constructed from anArraySegment<byte>
, avoiding the need to read into aJToken
and construct aStream
from the token, much reducing serialization time and allocations.Allocated memory/op
The allocated memory per op is higher across the board with utf8json. To determine if this was a fixed amount of allocated memory/op, two searches were performed per benchmark method. The amount of allocated memory doubles
10,000 Stackoverflow questions with 2 search requests per benchmarked method
The text was updated successfully, but these errors were encountered: